ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (5): 1024-1035.doi: 10.7544/issn1000-1239.2017.20160351

• 人工智能 • 上一篇    下一篇

基于互信息的粒化特征加权多标签学习k近邻算法

李峰,苗夺谦,张志飞,张维   

  1. (同济大学计算机科学与技术系 上海 201804) (嵌入式系统与服务计算教育部重点实验室(同济大学) 上海 201804) (tjleefeng@163.com)
  • 出版日期: 2017-05-01
  • 基金资助: 
    国家自然科学基金项目(61273304,61573255); 高等学校博士学科点专项科研基金项目(20130072130004); 上海市自然科学基金项目(14ZR1442600)

Mutual Information Based Granular Feature Weighted k-Nearest Neighbors Algorithm for Multi-Label Learning

Li Feng, Miao Duoqian, Zhang Zhifei, Zhang Wei   

  1. (Department of Computer Science and Technology, Tongji University, Shanghai 201804) (Key Laboratory of Embedded Systems and Service Computing (Tongji University), Ministry of Education, Shanghai 201804)
  • Online: 2017-05-01

摘要: 传统基于k近邻的多标签学习算法,在寻找近邻度量样本间的距离时,对所有特征给予同等的重要度.这些算法大多采用分解策略,对单个标签独立预测,忽略了标签间的相关性.多标签学习算法的分类效果跟输入的特征有很大的关系,不同的特征含有的标签分类信息不同,故不同特征的重要度也不同.互信息是常用的度量2个变量间关联度的重要方法之一,能够有效度量特征含有标签分类的知识量.因此,根据特征含有标签分类知识量的大小,赋予相应的权重系数,提出一种基于互信息的粒化特征加权多标签学习k近邻算法(granular feature weighted k-nearest neighbors algorithm for multi-label learning, GFWML-kNN),该算法将标签空间粒化成多个标签粒,对每个标签粒计算特征的权重系数,以解决上述问题和标签组合爆炸问题.在计算特征权重时,考虑到了标签间可能的组合,把标签间的相关性融合进特征的权重系数.实验表明:相较于若干经典的多标签学习算法,所提算法GFWML-kNN整体上能取得较好的效果.

关键词: 互信息, 特征权重, 粒化, 多标签学习, k-近邻

Abstract: All features contribute equally to compute the distance between any pair of instances when finding the nearest neighbors in traditional kNN based multi-label learning algorithms. Furthermore, most of these algorithms transform the multi-label problem into a set of single-label binary problems, which ignore the label correlation. The performance of multi-label learning algorithm greatly depends on the input features, and different features contain different knowledge about the label classification, so the features should be given different importance. Mutual information is one of the widely used measures of dependency of variables, and can evaluate the knowledge contained in the feature about the label classification. Therefore, we propose a granular feature weighted k-nearest neighbors algorithm for multi-label learning based on mutual information, which gives the feature weights according to the knowledge contained in the feature. The proposed algorithm firstly granulates the label space into several label information granules to avoid the problem of label combination explosion problem, and then calculates feature weights for each label information granule, which takes label combinations into consideration to merge label correlations into feature weights. The experimental results show that the proposed algorithm can achieve better performance than other common multi-label learning algorithms.

Key words: mutual information, feature weight, granulation, multi-label learning, k-nearest neighbors

中图分类号: