关于短文本匹配的泛化性和迁移性的研究分析

马新宇; 范意兴; 郭嘉丰; 张儒清; 苏立新; 程学旗

doi:10.7544/issn1000-1239.20200626

关于短文本匹配的泛化性和迁移性的研究分析

An Empirical Investigation of Generalization and Transfer in Short Text Matching

摘要

摘要: 自然语言理解中的许多任务，比如自然语言推断任务、机器问答和复述问题，都可以看作是短文本匹配问题.近年来，大量的数据集和深度学习模型的涌现使得短文本匹配任务取得了长足的进步，然而，很少有工作去分析模型在不同数据集之间的泛化能力，以及如何在新领域中有效地利用现有不同领域中的大量带标注的数据，达到减少新领域的数据标注量和提升性能的目标.为此，重点分析了不同数据集之间的泛化性和迁移性，并且通过可视化的方式展示了影响数据集之间泛化性的因素.具体地，使用深度学习模型ESIM(enhanced sequential inference model)和预训练语言模型BERT(bidirectional encoder representations from transformers)在10个通用的短文本匹配数据集上进行了详尽的实验.通过实验，发现即使是在大规模语料预训练过的BERT，合适的迁移仍能带来性能提升.基于以上的分析，也发现通过在混合数据集预训练过的模型,在新的领域和少量样本情况下,具有较好的泛化能力和迁移能力.

Abstract: Many tasks in natural language understanding, such as natural language inference, question answering, and paraphrasing can be viewed as short text matching problems. Recently, the emergence of a large number of datasets and deep learning models has made great success in short text matching. However, little study has been done on analyzing the generalization of these datasets across different text matching tasks, and how to leverage these supervised datasets of multiple domains to new domains to reduce the cost of annotating and improve their performance. In this paper, we conduct an extensive investigation of generalization and transfer across different datasets and show the factors that affect the generalization through visualization. Specially, we experiment with a conventional neural semantic matching model ESIM (enhanced sequential inference model) and a pre-trained language model BERT (bidirectional encoder representations from transformers) over 10 common datasets. We show that even BERT which is pre-trained on a large-scale dataset can still improve performance on the target dataset through transfer learning. Following our analysis, we also demonstrate that pre-training on multiple datasets shows good generalization and transfer. In the case of a new domain and few-shot setting, BERT which we pre-train on the multiple datasets first and then transfers to new datasets achieves exciting performance.

HTML全文

参考文献(0)

施引文献

资源附件(0)