基于邻域粗糙集的多标记分类特征选择算法

段洁; 胡清华; 张灵均; 钱宇华; 李德玉

doi:10.7544/issn1000-1239.2015.20140544

基于邻域粗糙集的多标记分类特征选择算法

Feature Selection for Multi-Label Classification Based on Neighborhood Rough Sets

摘要

摘要: 多标记学习是一类复杂的决策任务，同一个对象可能同时属于多个类别.此类任务在文本分类、图像识别、基因功能分析等领域广泛存在.多标记分类任务往往由高维特征描述，存在大量无关和冗余的信息.目前已经提出了大量的单标记特征选择算法以应对维数灾难问题，但对于多标记的属性约简和特征选择却鲜有研究.将粗糙集应用于多标记数据的特征选择中，针对多标记分类任务，重新定义了邻域粗糙集的下近似和依赖度计算方法，探讨了这一模型的性质，进而构造了基于邻域粗糙集的多标记分类任务的特征选择算法，并给出了在公开数据上的实验结果.实验分析证明算法的有效性.

Abstract: Multi-label classification is a kind of complex decision making tasks, where one object may be assigned with more than one decision label. This kind of tasks widely exist in text categorization, image recognition, gene function analysis. Multi-label classification is usually described with high-dimensional vectors, and some of the features are superfluous and irrelevant. A great number of feature selection algorithms have been developed for single-label classification to conquer the curse of dimensionality. However, as to multi-label classification, fewer researches have been reported for designing feature selection algorithms. In this work, we introduce rough sets to multi-label classification for constructing a feature selection algorithm. We redefine the lower approximation and dependency, and discuss the properties of the model. After that, we design a neighborhood rough sets based feature selection algorithm for multi-label classification. Experimental results show the effectiveness of the proposed algorithm.

HTML全文

参考文献(0)

施引文献

资源附件(0)