ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (1): 56-65.doi: 10.7544/issn1000-1239.2015.20140544

所属专题: 2015优青专题

• 人工智能 • 上一篇    下一篇

基于邻域粗糙集的多标记分类特征选择算法

段洁1,胡清华1,张灵均1,钱宇华2,李德玉2   

  1. 1(天津大学计算机科学与技术学院 天津 300072); 2(山西大学计算机与信息技术学院 太原 030006) (huqinghua@tju.edu.cn)
  • 出版日期: 2015-01-01
  • 基金资助: 
    基金项目:国家自然科学基金优秀青年科学基金项目(61222210)|国家自然科学基金重点项目(61432011)|国家自然科学基金面上项目(61272095)

Feature Selection for Multi-Label Classification Based on Neighborhood Rough Sets

Duan Jie1, Hu Qinghua1,Zhang Lingjun1,Qian Yuhua2,Li Deyu2   

  1. 1(School of Computer Science and Technology, Tianjin University, Tianjin 300072); 2(School of Computer and Information Technology, Shanxi University, Taiyuan 030006)
  • Online: 2015-01-01

摘要: 多标记学习是一类复杂的决策任务,同一个对象可能同时属于多个类别.此类任务在文本分类、图像识别、基因功能分析等领域广泛存在.多标记分类任务往往由高维特征描述,存在大量无关和冗余的信息.目前已经提出了大量的单标记特征选择算法以应对维数灾难问题,但对于多标记的属性约简和特征选择却鲜有研究.将粗糙集应用于多标记数据的特征选择中,针对多标记分类任务,重新定义了邻域粗糙集的下近似和依赖度计算方法,探讨了这一模型的性质,进而构造了基于邻域粗糙集的多标记分类任务的特征选择算法,并给出了在公开数据上的实验结果.实验分析证明算法的有效性.

关键词: 多标记分类, 特征选择, 邻域粗糙集, 依赖度

Abstract: Multi-label classification is a kind of complex decision making tasks, where one object may be assigned with more than one decision label. This kind of tasks widely exist in text categorization, image recognition, gene function analysis. Multi-label classification is usually described with high-dimensional vectors, and some of the features are superfluous and irrelevant. A great number of feature selection algorithms have been developed for single-label classification to conquer the curse of dimensionality. However, as to multi-label classification, fewer researches have been reported for designing feature selection algorithms. In this work, we introduce rough sets to multi-label classification for constructing a feature selection algorithm. We redefine the lower approximation and dependency, and discuss the properties of the model. After that, we design a neighborhood rough sets based feature selection algorithm for multi-label classification. Experimental results show the effectiveness of the proposed algorithm.

Key words: multi-label classification, feature selection, neighborhood rough sets, dependency

中图分类号: