Fast Kernel-Based Partial Label Learning Algorithm Based on Variational Gaussian Process Model
-
摘要: 偏标记学习(partial label learning)是人们最近提出的一种弱监督机器学习框架,由于放松了训练数据集的构造条件,只需知道训练样本的真实标记的一个候选集合就可进行学习,可以更方便地处理很多领域的实际问题.在该框架下,训练数据的标记信息不再具有单一性和明确性,这就使得学习算法的构建变得比传统分类问题更加困难,目前只建立了几种面向小规模训练数据的学习算法.先利用ECOC技术将原始偏标记训练集转换为若干标准二分类数据集,然后基于变分高斯过程模型在每个二分类数据集上构建一个具有较低计算复杂度的二分类算法,最终实现了一种面向大规模数据的快速核偏标记学习算法.仿真实验结果表明,所提算法在预测精度几乎相当的情况下,训练时间要远远少于已有的核偏标记学习算法,利用普通的PC机处理样本规模达到百万级的问题只需要40min.Abstract: Partial label learning is a weakly-supervised machine learning framework proposed recently. Since it loosens the requirement to training data set, i.e. the learning model can be obtained when each training example is only associated with a candidate set of the ground-truth labels, and partial label learning framework can be used to deal with many real-world tasks more conveniently. The ambiguity in training data inevitably makes partial label learning problem more difficult to be addressed than traditional classification problem, and only several algorithms for small-scale training set are available up to the present. Based on ECOC technology and variational Gaussian process model, this paper presents a fast kernel-based partial label learning algorithm which can deal with large-scale problem effectively. The basic strategy is to convert the original training data set into several standard two-class data sets by using ECOC technology firstly, and then to develop a binary classify with lower computational complexity on each two-class data set by using variational Gaussian process model. The experimental results show that the proposed algorithm can achieve almost the same accuracy as the existing state-of-the-art kernel-based partial label learning algorithms but use shorter computing time. More specifically, the proposed algorithm can deal with the problems with millions samples within 40 minutes on a personal computer.
-
Keywords:
- partial label learning /
- kernel method /
- large-scale data /
- Gaussian process /
- classification
-
-
期刊类型引用(12)
1. 刘强,朱金森,赵龙龙,沙宇晨,刘尚东,季一木. 基于字句动态特征和自注意力的情感分析方法. 数据采集与处理. 2024(01): 193-203 . 百度学术
2. 韩虎,孟甜甜. 面向语法加权图文本的方面情感三元组抽取. 北京航空航天大学学报. 2024(02): 409-418 . 百度学术
3. 郭磊,贾真,李天瑞. 面向方面级情感分析的交互式关系图注意力网络. 计算机应用. 2024(03): 696-701 . 百度学术
4. 杨锐,刘永坚,解庆,刘平峰. 基于Graph-LSTMs的双重位置感知方面级情感分类. 计算机应用与软件. 2024(04): 165-172 . 百度学术
5. 刘辉,马祥,张琳玉,何如瑾. 融合匹配长短时记忆网络和语法距离的方面级情感分析模型. 计算机应用. 2023(01): 45-50 . 百度学术
6. 代祖华,刘园园,狄世龙. 语义增强的图神经网络方面级文本情感分析. 计算机工程. 2023(06): 71-80 . 百度学术
7. 孟甜甜,韩虎,吴渊航. 面向方面抽取与情感分类的多任务联合建模. 计算机科学与探索. 2023(07): 1669-1679 . 百度学术
8. 程帆,王芳,黄树成. 基于混合编码与双通道GCN的方面级情感分析. 软件导刊. 2023(07): 15-20 . 百度学术
9. 孙天伟,杨长春,顾晓清,谈国胜. 结合共现网络的方面级情感分析研究. 计算机工程与应用. 2023(20): 111-118 . 百度学术
10. 张文钧,蒋良孝,张欢,陈龙. 一种双层贝叶斯模型:随机森林朴素贝叶斯. 计算机研究与发展. 2021(09): 2040-2051 . 本站查看
11. 韩虎,吴渊航,秦晓雅. 面向方面级情感分析的交互图注意力网络模型. 电子与信息学报. 2021(11): 3282-3290 . 百度学术
12. 巫浩盛,缪裕青,张万桢,周明,文益民. 基于距离与图卷积网络的方面级情感分析. 计算机应用研究. 2021(11): 3274-3278+3321 . 百度学术
其他类型引用(33)
计量
- 文章访问数: 1198
- HTML全文浏览量: 1
- PDF下载量: 597
- 被引次数: 45