一种新的半监督归纳迁移学习框架：Co-Transfer

文益民; 员喆; 余航

doi:10.7544/issn1000-1239.202220232

一种新的半监督归纳迁移学习框架：Co-Transfer

A New Semi-Supervised Inductive Transfer Learning Framework: Co-Transfer

摘要

摘要: 在许多实际的数据挖掘应用场景，如网络入侵检测、Twitter垃圾邮件检测、计算机辅助诊断等中，与目标域分布不同但相关的源域普遍存在. 一般情况下,在源域和目标域中都有大量未标记样本，对其中的每个样本都进行标记是件困难的、昂贵的、耗时的事，有时也没必要. 因此,充分挖掘源域和目标域中标记和未标记样本来解决目标域中的分类任务非常重要且有意义. 结合归纳迁移学习和半监督学习，提出一种名为Co-Transfer的半监督归纳迁移学习框架. Co-Transfer首先生成3个TrAdaBoost分类器用于实现从原始源域到原始目标域的迁移学习，同时生成另外3个TrAdaBoost分类器用于实现从原始目标域到原始源域的迁移学习. 这2组分类器都使用从原始源域和原始目标域的原有标记样本的有放回抽样来训练. 在Co-Transfer的每一轮迭代中，每组TrAdaBoost分类器使用新的训练集更新，其中一部分训练样本是原有的标记样本,一部分是由本组TrAdaBoost分类器标记的样本,还有一部分则由另一组TrAdaBoost分类器标记. 迭代终止后，把从原始源域到原始目标域的3个TrAdaBoost分类器的集成作为原始目标域分类器. 在UCI数据集和文本分类数据集上的实验结果表明，Co-Transfer可以有效地学习源域和目标域的标记和未标记样本从而提升泛化性能.

Abstract: In many practical data mining scenarios, such as network intrusion detection, Twitter spam detection, and computer-aided diagnosis, source domain that is different but related to a target domain is very common. Generally, a large amount of unlabeled data is available in both source domain and target domain, but labeling each of them is difficult, expensive, time-consuming, and sometime unnecessary. Therefore, it is very important and worthwhile to fully explore the labeled and unlabeled data in source domain and target domain to handle classification tasks in target domain. To leverage transfer learning and semi-supervised learning, we propose a new inductive transfer learning framework named Co-Transfer. Co-Transfer first generates three TrAdaBoost classifiers for transfer learning from the original source domain to the original target domain, and meanwhile another three TrAdaBoost classifiers are generated for transfer learning from the original target domain to the original source domain by bootstrapping samples from the original labeled data. In each round of Co-Transfer, each group of TrAdaBoost classifiers is refined by using the carefully labeled data, one part of which is the original labeled samples, the second part is the samples labeled by one group of TrAdaBoost classifiers, and the other samples are labeled by another group of TrAdaBoost classifiers. Finally, the group of TrAdaBoost classifiers learned to transfer from the original source domain to the original target domain to produce the final hypothesis. Experimental results on UCI and text classification task datasets illustrate that Co-Transfer can significantly improve generalization performance by exploring labeled and unlabeled data across different tasks.

HTML全文

参考文献(29)

施引文献

资源附件(0)