An Empirical Investigation of Generalization and Transfer in Short Text Matching

Ma Xinyu; Fan Yixing; Guo Jiafeng; Zhang Ruqing; Su Lixin; Cheng Xueqi

doi:10.7544/issn1000-1239.20200626

Journal of Computer Research and Development > 2022 > 59(1): 118-126. > DOI: 10.7544/issn1000-1239.20200626 CSTR: 32373.14.issn1000-1239.20200626

Ma Xinyu, Fan Yixing, Guo Jiafeng, Zhang Ruqing, Su Lixin, Cheng Xueqi. An Empirical Investigation of Generalization and Transfer in Short Text Matching[J]. Journal of Computer Research and Development, 2022, 59(1): 118-126. DOI: 10.7544/issn1000-1239.20200626

Citation:

PDF (1583 KB)

An Empirical Investigation of Generalization and Transfer in Short Text Matching

(CAS Key Laboratory of Network Data Science & Technology (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190) (University of Chinese Academy of Sciences, Beijing 100049)

Funds: This work was supported by the National Natural Science Foundation of China (61722211, 61773362, 61872338, 62006218, 61902381), the National Key Research and Development Program of China (2016QY02D0405), the Project of Beijing Academy of Artificial Intelligence (BAAI2019ZD0306), the Youth Innovation Promotion Association CAS (20144310, 2016102), the Project of Chongqing Research Program of Basic Research and Frontier Technology (cstc2017jcyjBX0059), the K.C.Wong Education Foundation, and the Lenovo-CAS Joint Lab Youth Scientist Project.

More Information

Published Date: December 31, 2021

Graphical Abstract

Abstract

Abstract

Many tasks in natural language understanding, such as natural language inference, question answering, and paraphrasing can be viewed as short text matching problems. Recently, the emergence of a large number of datasets and deep learning models has made great success in short text matching. However, little study has been done on analyzing the generalization of these datasets across different text matching tasks, and how to leverage these supervised datasets of multiple domains to new domains to reduce the cost of annotating and improve their performance. In this paper, we conduct an extensive investigation of generalization and transfer across different datasets and show the factors that affect the generalization through visualization. Specially, we experiment with a conventional neural semantic matching model ESIM (enhanced sequential inference model) and a pre-trained language model BERT (bidirectional encoder representations from transformers) over 10 common datasets. We show that even BERT which is pre-trained on a large-scale dataset can still improve performance on the target dataset through transfer learning. Following our analysis, we also demonstrate that pre-training on multiple datasets shows good generalization and transfer. In the case of a new domain and few-shot setting, BERT which we pre-train on the multiple datasets first and then transfers to new datasets achieves exciting performance.
- short text matching,
- generalization,
- transfer,
- few-shot,
- pre-trained language model

FullText(HTML)

References (0)

[1]	Wang Haitao, Li Zhanhuai, Zhang Xiao, Bu Hailong, Kong Lanxin, Zhao Xiaonan. Virtual Machine Resources Allocation Methods Based on History Data[J]. Journal of Computer Research and Development, 2019, 56(4): 779-789. DOI: 10.7544/issn1000-1239.2019.20170831
[2]	Liu Weijie, Wang Lina, Tan Cheng, Xu Lai. A Virtual Machine Introspection Triggering Mechanism Based on VMFUNC[J]. Journal of Computer Research and Development, 2017, 54(10): 2310-2320. DOI: 10.7544/issn1000-1239.2017.20170452
[3]	Shi Yuan, Zhang Huanguo, Wu Fusheng. A Method of Constructing the Model of Trusted Virtual Machine Migration[J]. Journal of Computer Research and Development, 2017, 54(10): 2284-2295. DOI: 10.7544/issn1000-1239.2017.20170465
[4]	Luo Yang, Xia Chunhe, Li Yazhuo, Wei Zhao, Liang Xiaoyan. A Polymorphic Shellcode Detection Method Based on Dual-Mode Virtual Machine[J]. Journal of Computer Research and Development, 2014, 51(8): 1704-1714. DOI: 10.7544/issn1000-1239.2014.20121149
[5]	Cai Wanwei, Tai Yunfang, Liu Qi, Zhang Ge. Memory Virtulization on MIPS Architecture[J]. Journal of Computer Research and Development, 2013, 50(10): 2247-2252.
[6]	Zhang Xiang, Huo Zhigang, Ma Jie, Meng Dan. Fast and Live Whole-System Migration of Virtual Machines[J]. Journal of Computer Research and Development, 2012, 49(3): 661-668.
[7]	Wang Kai, Hou Zifeng. A Relaxed Co-Scheduling Method of Virtual CPUs on Xen Virtual Machines[J]. Journal of Computer Research and Development, 2012, 49(1): 118-127.
[8]	Wang Kai, Hou Zifeng. An Adaptive Scheduling Method of Weight Parameter Adjustment on Virtual Machines[J]. Journal of Computer Research and Development, 2011, 48(11): 2094-2102.
[9]	Jin Hai, Zhong Alin, Wu Song, and Shi Xuanhua. Virtual Machine VCPU Scheduling in the Multi-core Environment:Issues and Challenges[J]. Journal of Computer Research and Development, 2011, 48(7): 1216-1224.
[10]	Chen Hui, Chen Yiyun, Wu Ping, and Xiang Sen. A Typed Low-Level Language Used in Java Virtual Machine[J]. Journal of Computer Research and Development, 2006, 43(1): 15-22.

Cited By

Cited by

Periodical cited type(4)

1.	崔建群，晏晖然，常亚楠，高梦楠，马致远 . 融合协同过滤和相遇概率预测的DTN路由算法. 小型微型计算机系统. 2025(03): 735-743 .
2.	王新科，高瑞敏. 基于DTN路由的多通路精准灌溉系统布局设计. 农机化研究. 2024(07): 141-145 .
3.	陈启航，马大玮，张世伟，肖玲娜，李成俊. 一种基于地理位置信息的机会网络路由. 通信技术. 2022(08): 1020-1025 .
4.	涂芳，曾铭，邓左祥. 车联网ABC及研究综述. 科技视界. 2022(28): 1-4 .