Abstract:
Multi-label classification is the learning problem where each instance is associated with a set of labels. Feature selection is capable of eliminating redundant and irrelevant features in multi-label classification, which leads to performance improvement of multi-label classifiers. However the existing feature selection methods have high computation complexity and are not able to give a reasonable feature subset. Hence a novel multi-label feature selection algorithm based on information entropy is proposed in this paper. It assumes that features are independent of each other. Its main ideas are: 1) The information gain between the feature and label set is derived from the information gain between the feature and the label, and employed to measure the correlation degree between them; 2) An threshold selection method is used to choose a reasonable feature subset from original features. The proposed algorithm firstly computes the information gain between each feature and label set, and then removes the irrelevant and redundant features according to the selected information gain value determined by threshold selection method. The experiment is conducted on four different datasets and two different classifiers. The experimental results and their analysis show that the proposed algorithm can effectively promote the performance of multi-label classifiers in multi-label classification.