Abstract:
In many practical data mining scenarios, such as network intrusion detection, Twitter spam detection, and computer-aided diagnosis, source domain that is different but related to a target domain is very common. Generally, a large amount of unlabeled data is available in both source domain and target domain, but labeling each of them is difficult, expensive, time-consuming, and sometime unnecessary. Therefore, it is very important and worthwhile to fully explore the labeled and unlabeled data in source domain and target domain to handle classification tasks in target domain. To leverage transfer learning and semi-supervised learning, we propose a new inductive transfer learning framework named Co-Transfer. Co-Transfer first generates three TrAdaBoost classifiers for transfer learning from the original source domain to the original target domain, and meanwhile another three TrAdaBoost classifiers are generated for transfer learning from the original target domain to the original source domain by bootstrapping samples from the original labeled data. In each round of Co-Transfer, each group of TrAdaBoost classifiers is refined by using the carefully labeled data, one part of which is the original labeled samples, the second part is the samples labeled by one group of TrAdaBoost classifiers, and the other samples are labeled by another group of TrAdaBoost classifiers. Finally, the group of TrAdaBoost classifiers learned to transfer from the original source domain to the original target domain to produce the final hypothesis. Experimental results on UCI and text classification task datasets illustrate that Co-Transfer can significantly improve generalization performance by exploring labeled and unlabeled data across different tasks.