ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (1): 105-117.doi: 10.7544/issn1000-1239.20200765

• 人工智能 • 上一篇    下一篇

基于特征分离的跨域自适应学习模型

李鑫,李哲民,魏居辉,杨雅婷,王红霞   

  1. (国防科技大学文理学院 长沙 410073) (li_xin@nudt.edu.cn)
  • 出版日期: 2022-01-01
  • 基金资助: 
    国家自然科学基金项目(61977065);科技部重点研发计划项目(2020YFA0713504) (2020YFA0713504).

Cross-Domain Adaptive Learning Model Based on Feature Separation

Li Xin, Li Zhemin, Wei Juhui, Yang Yating, Wang Hongxia   

  1. (College of Liberal Arts and Sciences, National University of Defense Technology, Changsha 410073)
  • Online: 2022-01-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61977065) and the Key Research and Development Program of the Ministry of Science and Technology

摘要: 跨域训练任务是目前机器学习领域的一个开放性挑战问题.目前最新的研究都在讨论利用真实特征的跨域不变性对未知域数据进行预测,从而实现跨域泛化能力.但事实上,当知道数据来自哪个域时,综合利用真实特征和虚假特征会取得更好的预测效果.针对这一问题,设计了一个同时适用于跨域泛化和跨域适应任务的学习模型CDGA(cross-domain generalization and adaptation model).该模型的核心仍是分离出真实特征,因此新提出了一种更加稳定的训练风险函数,其在跨域泛化问题中不仅具有更高的测试准确率,还克服了现有方法容易过拟合的缺点,可以很好地嵌入到CDGA模型中.另外,通过设计的算法训练后,可使CDGA模型的数据表达部分有效地分离出真实特征和虚假特征,而分类器部分自适应学习选择泛化分类器或特定环境的分类器,从而结合应用了虚假特征,在跨域任务中实现高效预测.最后在构建的彩色手写数字数据集上测试,结果显著优于已有方法.

关键词: 机器学习, 特征表达, 跨域训练, 泛化, 适应

Abstract: Cross-domain training tasks are currently an open challenge in the field of machine learning. At present, the latest researches are discussing the use of the cross-domain invariance of real features to predict unknown domain data, so as to achieve cross-domain generalization capabilities. But in fact, when it is known that the data comes from a certain domain, the comprehensive use of real features and false features will achieve better prediction results. This paper focuses on this issue and designs a learning model that is suitable for both cross-domain generalization and adaptation tasks(CDGA). The core of the model is still to separate the real features, so this paper proposes a new more stable training risk function, which not only has a higher test accuracy in the cross-domain generalization problem, but also overcomes the shortcomings of traditional methods that are easy to overfit, so it can be well embedded in the CDGA model. In addition, through the designed training method, the data expression part of the CDGA model can effectively separate the real features and false features, and the classifier part adaptively learns to select the generalized classifier or the classifier of the specific environment, thereby combining the application of false features to achieve efficient prediction in cross-domain tasks. Finally, it is tested on the constructed Colored MNIST data set, and the results are significantly better than the existing methods.

Key words: machine learning, feature expression, cross-domain training, generalization, adaptation

中图分类号: