• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Ma Xinyu, Fan Yixing, Guo Jiafeng, Zhang Ruqing, Su Lixin, Cheng Xueqi. An Empirical Investigation of Generalization and Transfer in Short Text Matching[J]. Journal of Computer Research and Development, 2022, 59(1): 118-126. DOI: 10.7544/issn1000-1239.20200626
Citation: Ma Xinyu, Fan Yixing, Guo Jiafeng, Zhang Ruqing, Su Lixin, Cheng Xueqi. An Empirical Investigation of Generalization and Transfer in Short Text Matching[J]. Journal of Computer Research and Development, 2022, 59(1): 118-126. DOI: 10.7544/issn1000-1239.20200626

An Empirical Investigation of Generalization and Transfer in Short Text Matching

Funds: This work was supported by the National Natural Science Foundation of China (61722211, 61773362, 61872338, 62006218, 61902381), the National Key Research and Development Program of China (2016QY02D0405), the Project of Beijing Academy of Artificial Intelligence (BAAI2019ZD0306), the Youth Innovation Promotion Association CAS (20144310, 2016102), the Project of Chongqing Research Program of Basic Research and Frontier Technology (cstc2017jcyjBX0059), the K.C.Wong Education Foundation, and the Lenovo-CAS Joint Lab Youth Scientist Project.
More Information
  • Published Date: December 31, 2021
  • Many tasks in natural language understanding, such as natural language inference, question answering, and paraphrasing can be viewed as short text matching problems. Recently, the emergence of a large number of datasets and deep learning models has made great success in short text matching. However, little study has been done on analyzing the generalization of these datasets across different text matching tasks, and how to leverage these supervised datasets of multiple domains to new domains to reduce the cost of annotating and improve their performance. In this paper, we conduct an extensive investigation of generalization and transfer across different datasets and show the factors that affect the generalization through visualization. Specially, we experiment with a conventional neural semantic matching model ESIM (enhanced sequential inference model) and a pre-trained language model BERT (bidirectional encoder representations from transformers) over 10 common datasets. We show that even BERT which is pre-trained on a large-scale dataset can still improve performance on the target dataset through transfer learning. Following our analysis, we also demonstrate that pre-training on multiple datasets shows good generalization and transfer. In the case of a new domain and few-shot setting, BERT which we pre-train on the multiple datasets first and then transfers to new datasets achieves exciting performance.
  • Related Articles

    [1]Li Dongwen, Zhong Zhenyu, Sun Yufei, Shen Junyu, Ma Zizhi, Yu Chuanyue, Zhang Yuzhi. LingLong: A High-Quality Small-Scale Chinese Pre-trained Language Model[J]. Journal of Computer Research and Development, 2025, 62(3): 682-693. DOI: 10.7544/issn1000-1239.202330844
    [2]Jiang Yi, Yang Yong, Yin Jiali, Liu Xiaolei, Li Jiliang, Wang Wei, Tian Youliang, Wu Yingcai, Ji Shouling. A Survey on Security and Privacy Risks in Large Language Models[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440265
    [3]Cui Yuanning, Sun Zequn, Hu Wei. A Pre-trained Universal Knowledge Graph Reasoning Model Based on Rule Prompts[J]. Journal of Computer Research and Development, 2024, 61(8): 2030-2044. DOI: 10.7544/issn1000-1239.202440133
    [4]Shu Wentao, Li Ruixiao, Sun Tianxiang, Huang Xuanjing, Qiu Xipeng. Large Language Models: Principles, Implementation, and Progress[J]. Journal of Computer Research and Development, 2024, 61(2): 351-361. DOI: 10.7544/issn1000-1239.202330303
    [5]Zhang Naizhou, Cao Wei, Zhang Xiaojian, Li Shijun. Conversation Generation Based on Variational Attention Knowledge Selection and Pre-trained Language Model[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440551
    [6]Chen Honghui, Zheng Jianming, Cai Fei, Han Yi. Modeling of Few-Shot Relation Extraction Based on Adaptive Self-Training[J]. Journal of Computer Research and Development, 2023, 60(7): 1581-1591. DOI: 10.7544/issn1000-1239.202220216
    [7]Liu Zhuang, Liu Chang, Wayne Lin, Zhao Jun. Pretraining Financial Language Model with Multi-Task Learning for Financial Text Mining[J]. Journal of Computer Research and Development, 2021, 58(8): 1761-1772. DOI: 10.7544/issn1000-1239.2021.20210298
    [8]Zhang Dongjie, Huang Longtao, Zhang Rong, Xue Hui, Lin Junyu, Lu Yao. Fake Review Detection Based on Joint Topic and Sentiment Pre-Training Model[J]. Journal of Computer Research and Development, 2021, 58(7): 1385-1394. DOI: 10.7544/issn1000-1239.2021.20200817
    [9]Cheng Xiaoyang, Zhan Yongzhao, Mao Qirong, Zhan Zhicai. Video Semantic Analysis Based on Topographic Sparse Pre-Training CNN[J]. Journal of Computer Research and Development, 2018, 55(12): 2703-2714. DOI: 10.7544/issn1000-1239.2018.20170579
    [10]Yu Hao, Bu Fenglin, Gao Jianfeng. Perceptron for Language Modeling[J]. Journal of Computer Research and Development, 2006, 43(2): 260-267.
  • Cited by

    Periodical cited type(7)

    1. 殷秀秀,檀健,朱金秋,张诗韵. 融合维度构建和数据增强的评教文本匹配算法. 中北大学学报(自然科学版). 2025(01): 10-18 .
    2. 孙莹,章玉婷,庄福振,祝恒书,何清,熊辉. 基于集合效用边际贡献学习的可解释薪酬预测算法. 计算机研究与发展. 2024(05): 1276-1289 . 本站查看
    3. 李思恒,金蓓弘,张扶桑,王志,马俊麒,苏畅,任晓勇,刘海琴. 基于多任务注意力网络的非接触式睡眠监测. 计算机研究与发展. 2024(11): 2739-2753 . 本站查看
    4. 臧洁,周万林,王妍. 融合多头注意力机制和孪生网络的语义匹配方法. 计算机科学. 2023(12): 294-301 .
    5. 贾钰峰,李容,章蓬伟,邵小青. 基于字向量的短文本情感分类研究. 微处理机. 2023(06): 40-45 .
    6. 丁露雨,吕阳,李奇峰,王朝元,余礼根,宗伟勋. 融合多环境参数的鸡粪氨气排放预测模型研究. 农业机械学报. 2022(05): 366-375 .
    7. 钱杨舸,秦小林,张思齐,廖兴滨. 基于深度学习的文本语义匹配综述. 软件导刊. 2022(12): 252-261 .

    Other cited types(8)

Catalog

    Article views (571) PDF downloads (343) Cited by(15)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return