高级检索

    基于变分高斯过程模型的快速核偏标记学习算法

    Fast Kernel-Based Partial Label Learning Algorithm Based on Variational Gaussian Process Model

    • 摘要: 偏标记学习(partial label learning)是人们最近提出的一种弱监督机器学习框架,由于放松了训练数据集的构造条件,只需知道训练样本的真实标记的一个候选集合就可进行学习,可以更方便地处理很多领域的实际问题.在该框架下,训练数据的标记信息不再具有单一性和明确性,这就使得学习算法的构建变得比传统分类问题更加困难,目前只建立了几种面向小规模训练数据的学习算法.先利用ECOC技术将原始偏标记训练集转换为若干标准二分类数据集,然后基于变分高斯过程模型在每个二分类数据集上构建一个具有较低计算复杂度的二分类算法,最终实现了一种面向大规模数据的快速核偏标记学习算法.仿真实验结果表明,所提算法在预测精度几乎相当的情况下,训练时间要远远少于已有的核偏标记学习算法,利用普通的PC机处理样本规模达到百万级的问题只需要40min.

       

      Abstract: Partial label learning is a weakly-supervised machine learning framework proposed recently. Since it loosens the requirement to training data set, i.e. the learning model can be obtained when each training example is only associated with a candidate set of the ground-truth labels, and partial label learning framework can be used to deal with many real-world tasks more conveniently. The ambiguity in training data inevitably makes partial label learning problem more difficult to be addressed than traditional classification problem, and only several algorithms for small-scale training set are available up to the present. Based on ECOC technology and variational Gaussian process model, this paper presents a fast kernel-based partial label learning algorithm which can deal with large-scale problem effectively. The basic strategy is to convert the original training data set into several standard two-class data sets by using ECOC technology firstly, and then to develop a binary classify with lower computational complexity on each two-class data set by using variational Gaussian process model. The experimental results show that the proposed algorithm can achieve almost the same accuracy as the existing state-of-the-art kernel-based partial label learning algorithms but use shorter computing time. More specifically, the proposed algorithm can deal with the problems with millions samples within 40 minutes on a personal computer.

       

    /

    返回文章
    返回