一类基于信息熵的多标签特征选择算法

张振海  李士宁  李志刚  陈  昊

一类基于信息熵的多标签特征选择算法

张振海李士宁李志刚陈昊

Multi-Label Feature Selection Algorithm Based on Information Entropy

Zhang Zhenhai, Li Shining, Li Zhigang, and Chen Hao

摘要

摘要: 在多标签分类问题中，特征选择是提升多标签分类器性能的一种重要手段.针对目前多标签特征选择算法计算复杂度大和无法给出一个合理的特征子集的问题，提出了一种基于信息熵的多标签特征选择算法.该算法假设特征之间相互独立，使用特征与标签集合之间的信息增益来衡量特征与标签集合之间的重要程度，并据此提出一种信息增益阈值选择方法.首先计算每一个特征与标签集合之间的信息增益，然后使用信息增益阈值选择算法得到一个合理的阈值，最后根据阈值删除不相关的特征，得到一组合理的特征子集.在2个不同分类器和4个多标签数据集上的实验结果表明：特征选择算法能够有效地提升多标签分类器的分类性能.

Abstract: Multi-label classification is the learning problem where each instance is associated with a set of labels. Feature selection is capable of eliminating redundant and irrelevant features in multi-label classification, which leads to performance improvement of multi-label classifiers. However the existing feature selection methods have high computation complexity and are not able to give a reasonable feature subset. Hence a novel multi-label feature selection algorithm based on information entropy is proposed in this paper. It assumes that features are independent of each other. Its main ideas are: 1) The information gain between the feature and label set is derived from the information gain between the feature and the label, and employed to measure the correlation degree between them; 2) An threshold selection method is used to choose a reasonable feature subset from original features. The proposed algorithm firstly computes the information gain between each feature and label set, and then removes the irrelevant and redundant features according to the selected information gain value determined by threshold selection method. The experiment is conducted on four different datasets and two different classifiers. The experimental results and their analysis show that the proposed algorithm can effectively promote the performance of multi-label classifiers in multi-label classification.

HTML全文

参考文献(0)

施引文献

资源附件(0)