ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2022, Vol. 59 ›› Issue (1): 118-126.doi: 10.7544/issn1000-1239.20200626

Previous Articles     Next Articles

An Empirical Investigation of Generalization and Transfer in Short Text Matching

Ma Xinyu, Fan Yixing, Guo Jiafeng, Zhang Ruqing, Su Lixin, Cheng Xueqi   

  1. (CAS Key Laboratory of Network Data Science & Technology (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190) (University of Chinese Academy of Sciences, Beijing 100049)
  • Online:2022-01-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61722211, 61773362, 61872338, 62006218, 61902381), the National Key Research and Development Program of China (2016QY02D0405), the Project of Beijing Academy of Artificial Intelligence (BAAI2019ZD0306), the Youth Innovation Promotion Association CAS (20144310, 2016102), the Project of Chongqing Research Program of Basic Research and Frontier Technology (cstc2017jcyjBX0059), the K.C.Wong Education Foundation, and the Lenovo-CAS Joint Lab Youth Scientist Project.

Abstract: Many tasks in natural language understanding, such as natural language inference, question answering, and paraphrasing can be viewed as short text matching problems. Recently, the emergence of a large number of datasets and deep learning models has made great success in short text matching. However, little study has been done on analyzing the generalization of these datasets across different text matching tasks, and how to leverage these supervised datasets of multiple domains to new domains to reduce the cost of annotating and improve their performance. In this paper, we conduct an extensive investigation of generalization and transfer across different datasets and show the factors that affect the generalization through visualization. Specially, we experiment with a conventional neural semantic matching model ESIM (enhanced sequential inference model) and a pre-trained language model BERT (bidirectional encoder representations from transformers) over 10 common datasets. We show that even BERT which is pre-trained on a large-scale dataset can still improve performance on the target dataset through transfer learning. Following our analysis, we also demonstrate that pre-training on multiple datasets shows good generalization and transfer. In the case of a new domain and few-shot setting, BERT which we pre-train on the multiple datasets first and then transfers to new datasets achieves exciting performance.

Key words: short text matching, generalization, transfer, few-shot, pre-trained language model

CLC Number: