ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (6): 1171-1184.doi: 10.7544/issn1000-1239.2017.20170002

所属专题: 2017优青专题

• 人工智能 • 上一篇    下一篇

面向标记分布学习的标记增强

耿新,徐宁,邵瑞枫   

  1. (东南大学计算机科学与工程学院 南京 211189) (计算机网络和信息集成教育部重点实验室(东南大学) 南京 211189) (软件新技术与产业化协同创新中心(南京大学) 南京 210093) (无线通信技术协同创新中心(东南大学) 南京 211189) (xgeng@seu.edu.cn)
  • 出版日期: 2017-06-01
  • 基金资助: 
    国家自然科学基金优秀青年科学基金项目(61622203);江苏省自然科学基金杰出青年基金项目(BK20140022)

Label Enhancement for Label Distribution Learning

Geng Xin, Xu Ning, Shao Ruifeng   

  1. (School of Computer Science and Engineering, Southeast University, Nanjing 211189) (Key Laboratory of Computer Network and Information Integration (Southeast University), Ministry of Education, Nanjing 211189) (Collaborative Innovation Center of Novel Software Technology and Industrialization (Nanjing University), Nanjing 210093) (Collaborative Innovation Center of Wireless Communications Technology (Southeast University), Nanjing 211189)
  • Online: 2017-06-01

摘要: 多标记学习(multi-label learning, MLL)任务处理一个示例对应多个标记的情况,其目标是学习一个从示例到相关标记集合的映射.在MLL中,现有方法一般都是采用均匀标记分布假设,也就是各个相关标记(正标记)对于示例的重要程度都被当作是相等的.然而,对于许多真实世界中的学习问题,不同相关标记的重要程度往往是不同的.为此,标记分布学习将不同标记的重要程度用标记分布来刻画,已经取得很好的效果.但是很多数据中却仅包含简单的逻辑标记而非标记分布.为解决这一问题,可以通过挖掘训练样本中蕴含的标记重要性差异信息,将逻辑标记转化为标记分布,进而通过标记分布学习有效地提升预测精度.上述将原始逻辑标记提升为标记分布的过程,定义为面向标记分布学习的标记增强.首次提出了标记增强这一概念,给出了标记增强的形式化定义,总结了现有的可以用于标记增强的算法,并进行了对比实验.实验结果表明:使用标记增强能够挖掘出数据中隐含的标记重要性差异信息,并有效地提升MLL的效果.

关键词: 多标记学习, 标记分布学习, 标记增强, 逻辑标记, 标记分布

Abstract: Multi-label learning (MLL) deals with the case where each instance is associated with multiple labels. Its target is to learn the mapping from instance to relevant label set. Most existing MLL methods adopt the uniform label distribution assumption, i.e., the importance of all relevant (positive) labels is the same for the instance. However, for many real-world learning problems, the importance of different relevant labels is often different. For this issue, label distribution learning (LDL) has achieved good results by modeling the different importance of labels with a label distribution. Unfortunately, many datasets only contain simple logical labels rather than label distributions. To solve the problem, one way is to transform the logical labels into label distributions by mining the hidden label importance from the training examples, and then promote prediction precision via label distribution learning. Such process of transforming logical labels into label distributions is defined as label enhancement for label distribution learning. This paper first proposes the concept of label enhancement with a formal definition. Then, existing algorithms that can be used for label enhancement have been surveyed, and compared in the experiments. Results of the experiments reveal that label enhancement can effectively discover the difference of the label importance hidden in the data, and improve the performance of multi-label learning.

Key words: multi-label learning (MLL), label distribution learning (LDL), label enhancement, logical label, label distribution

中图分类号: