基于互信息的粒化特征加权多标签学习k近邻算法

李峰; 苗夺谦; 张志飞; 张维

doi:10.7544/issn1000-1239.2017.20160351

基于互信息的粒化特征加权多标签学习k近邻算法

Mutual Information Based Granular Feature Weighted k-Nearest Neighbors Algorithm for Multi-Label Learning

摘要

摘要: 传统基于k近邻的多标签学习算法，在寻找近邻度量样本间的距离时，对所有特征给予同等的重要度.这些算法大多采用分解策略，对单个标签独立预测，忽略了标签间的相关性.多标签学习算法的分类效果跟输入的特征有很大的关系，不同的特征含有的标签分类信息不同，故不同特征的重要度也不同.互信息是常用的度量2个变量间关联度的重要方法之一，能够有效度量特征含有标签分类的知识量.因此，根据特征含有标签分类知识量的大小，赋予相应的权重系数，提出一种基于互信息的粒化特征加权多标签学习k近邻算法(granular feature weighted k-nearest neighbors algorithm for multi-label learning, GFWML-kNN),该算法将标签空间粒化成多个标签粒，对每个标签粒计算特征的权重系数,以解决上述问题和标签组合爆炸问题.在计算特征权重时，考虑到了标签间可能的组合，把标签间的相关性融合进特征的权重系数.实验表明：相较于若干经典的多标签学习算法，所提算法GFWML-kNN整体上能取得较好的效果.

Abstract: All features contribute equally to compute the distance between any pair of instances when finding the nearest neighbors in traditional kNN based multi-label learning algorithms. Furthermore, most of these algorithms transform the multi-label problem into a set of single-label binary problems, which ignore the label correlation. The performance of multi-label learning algorithm greatly depends on the input features, and different features contain different knowledge about the label classification, so the features should be given different importance. Mutual information is one of the widely used measures of dependency of variables, and can evaluate the knowledge contained in the feature about the label classification. Therefore, we propose a granular feature weighted k-nearest neighbors algorithm for multi-label learning based on mutual information, which gives the feature weights according to the knowledge contained in the feature. The proposed algorithm firstly granulates the label space into several label information granules to avoid the problem of label combination explosion problem, and then calculates feature weights for each label information granule, which takes label combinations into consideration to merge label correlations into feature weights. The experimental results show that the proposed algorithm can achieve better performance than other common multi-label learning algorithms.

HTML全文

参考文献(0)

施引文献

资源附件(0)