ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (1): 63-70.doi: 10.7544/issn1000-1239.2017.20150796

• 人工智能 • 上一篇    下一篇

基于变分高斯过程模型的快速核偏标记学习算法

周瑜1,贺建军1,2,顾宏1   

  1. 1(大连理工大学电子信息与电气工程学部 辽宁大连 116024); 2(大连民族大学信息与通信工程学院 辽宁大连 116600) (yuzhou829@sina.com)
  • 出版日期: 2017-01-01
  • 基金资助: 
    国家自然科学基金项目(61503058,61502074,U1560102);辽宁省自然科学基金项目(201602190);中央高校基本科研业务费专项资金项目(DC201501055,DC201501060201) This work was supported by the National Natural Science Foundation of China (61503058,61502074,U1560102), the Natural Science Foundation of Liaoning Province of China (201602190), and the Fundamental Research Funds for the Central Universities (DC201501055, DC201501060201).

Fast Kernel-Based Partial Label Learning Algorithm Based on Variational Gaussian Process Model

Zhou Yu1, He Jianjun1,2, Gu Hong1   

  1. 1(Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024); 2(College of Information and Communication Engineering, Dalian Minzu University, Dalian, Liaoning 116600)
  • Online: 2017-01-01

摘要: 偏标记学习(partial label learning)是人们最近提出的一种弱监督机器学习框架,由于放松了训练数据集的构造条件,只需知道训练样本的真实标记的一个候选集合就可进行学习,可以更方便地处理很多领域的实际问题.在该框架下,训练数据的标记信息不再具有单一性和明确性,这就使得学习算法的构建变得比传统分类问题更加困难,目前只建立了几种面向小规模训练数据的学习算法.先利用ECOC技术将原始偏标记训练集转换为若干标准二分类数据集,然后基于变分高斯过程模型在每个二分类数据集上构建一个具有较低计算复杂度的二分类算法,最终实现了一种面向大规模数据的快速核偏标记学习算法.仿真实验结果表明,所提算法在预测精度几乎相当的情况下,训练时间要远远少于已有的核偏标记学习算法,利用普通的PC机处理样本规模达到百万级的问题只需要40min.

关键词: 偏标记学习, 核方法, 大规模数据, 高斯过程, 分类

Abstract: Partial label learning is a weakly-supervised machine learning framework proposed recently. Since it loosens the requirement to training data set, i.e. the learning model can be obtained when each training example is only associated with a candidate set of the ground-truth labels, and partial label learning framework can be used to deal with many real-world tasks more conveniently. The ambiguity in training data inevitably makes partial label learning problem more difficult to be addressed than traditional classification problem, and only several algorithms for small-scale training set are available up to the present. Based on ECOC technology and variational Gaussian process model, this paper presents a fast kernel-based partial label learning algorithm which can deal with large-scale problem effectively. The basic strategy is to convert the original training data set into several standard two-class data sets by using ECOC technology firstly, and then to develop a binary classify with lower computational complexity on each two-class data set by using variational Gaussian process model. The experimental results show that the proposed algorithm can achieve almost the same accuracy as the existing state-of-the-art kernel-based partial label learning algorithms but use shorter computing time. More specifically, the proposed algorithm can deal with the problems with millions samples within 40 minutes on a personal computer.

Key words: partial label learning, kernel method, large-scale data, Gaussian process, classification

中图分类号: