ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展

• •    下一篇

一种新的半监督归纳迁移学习框架:Co-Transfer

文益民,员喆,余航   

  1. (桂林电子科技大学计算机与信息安全学院 广西桂林 541004)(广西图像图形与智能处理重点实验室(桂林电子科技大学)广西桂林 541004)(上海大学计算机工程与科学学院 上海 200444)(ymwen@guet.edu.cn)
  • 出版日期: 2022-08-24

A New Semi-Supervised Inductive Transfer Learning Framework: Co-Transfer

Wen Yimin, Yuan Zhe, Yu Hang   

  1. (School of Computer Science and Information Safety, Guilin University of Electronic Technology, Guilin, Guangxi 541004)(Guangxi Key Laboratory of Image and Graphic Intelligent Processing(Guilin University of Electronic Technology), Guilin, Guangxi 541004)(School of Computer Engineering and Science, Shanghai University, Shanghai 200444)
  • Online: 2022-08-24

摘要: 摘要 在许多实际的数据挖掘应用场景中,如网络入侵检测、Twitter垃圾邮件检测、计算机辅助诊断等,与目标域分布不同但相关的源域普遍存在.一般情况下,在源域和目标域中都有大量未标记样本,但对其中的每个样本都进行标记是件困难的、昂贵的、耗时的事,有时也没必要.因此,充分挖掘源域和目标域中标记和未标记样本来解决目标域中的分类任务非常重要且意义非常.结合归纳迁移学习和半监督学习,提出一种名为Co-Transfer的半监督归纳迁移学习框架.Co-Transfer首先生成3个TrAdaBoost分类器用于实现从原始源域到原始目标域的迁移学习,同时生成另外3个TrAdaBoost分类器用于实现从原始目标域到原始源域的迁移学习.这2组分类器都使用从原始源域和原始目标域的原有标记样本的有放回抽样来训练.在Co-Transfer的每一轮迭代中,每组TrAdaBoost分类器使用新的训练集更新,其中一部分是原有的标记样本,一部分是由本组分类器标记的样本,另一部分则由另一组TrAdaBoost分类器标记.迭代终止后,把从原始源域到原始目标域的3个TrAdaBoost分类器的集成作为原始目标域分类器.在UCI数据集和文本分类数据集上的实验结果表明Co-Transfer可以有效地学习源域和目标域的标记和未标记样本提升泛化性能.

关键词: 半监督学习, 迁移学习, 多任务学习, 双向迁移, 集成学习

Abstract: In many practical data mining scenarios, such as network intrusion detection, Twitter spam detection, and computer-aided diagnosis, a source domain that is different from but related to a target domain is very common. Generally, a large amount of unlabeled data is available in both source and target domains, but labeling each of them is difficult, expensive, time-consuming, and sometime unnecessary. Therefore, it is very important and worthwhile to fully explore the labeled and unlabeled data in source and target domains to handle classification tasks in target domain. To leverage transfer learning and semi-supervised learning, this paper proposes a new inductive transfer learning framework named Co-Transfer. Co-Transfer first generates three TrAdaBoost classifiers for transfer learning from the original source domain to the original target domain, and meanwhile another three TrAdaBoost classifiers are generated for transfer learning from the original target domain to the original source domain, by bootstrapping samples from the original labeled data. In each round of Co-Transfer, each group of TrAdaBoost classifiers are refined by using the carefully labeled data, one part of which is the original labeled samples, the second part is the samples labeled by itself, and the other is labeled by another group of TrAdaBoost classifiers. Finally, the group of TrAdaBoost classifiers learned to transfer from the original source domain to the original target domain produce the final hypothesis. Experimental results on UCI and text classification task datasets illustrate that Co-Transfer can significantly improves generalization performance by exploring labeled and unlabeled data across different tasks. Code is available at https://gitee.com/ymw12345/co-transfer.git.

Key words: Key words semi-supervised learning, transfer learning, multi-task learning, bi-directional transfer, ensemble learning

中图分类号: