自动文本摘要研究综述

李金鹏; 张闯; 陈小军; 胡玥; 廖鹏程

doi:10.7544/issn1000-1239.2021.20190785

自动文本摘要研究综述

Survey on Automatic Text Summarization

摘要

摘要: 近年来，互联网技术的蓬勃发展极大地便利了人类的日常生活，不可避免的是互联网中的信息呈井喷式爆发，如何从中快速有效地获取所需信息显得极为重要.自动文本摘要技术的出现可以有效缓解该问题，其作为自然语言处理和人工智能领域的重要研究内容之一，利用计算机自动地从长文本或文本集合中提炼出一段能准确反映源文中心内容的简洁连贯的短文.探讨自动文本摘要任务的内涵，回顾和分析了自动文本摘要技术的发展，针对目前主要的2种摘要产生形式(抽取式和生成式)的具体工作进行了详细介绍，包括特征评分、分类算法、线性规划、次模函数、图排序、序列标注、启发式算法、深度学习等算法.并对自动文本摘要常用的数据集以及评价指标进行了分析，最后对其面临的挑战和未来的研究趋势、应用等进行了预测.

Abstract: In recent years, the rapid development of Internet technology has greatly facilitated the daily life of human, and it is inevitable that massive information erupts in a blowout. How to quickly and effectively obtain the required information on the Internet is an urgent problem. The automatic text summarization technology can effectively alleviate this problem. As one of the most important fields in natural language processing and artificial intelligence, it can automatically produce a concise and coherent summary from a long text or text set through computer, in which the summary should accurately reflect the central themes of source text. In this paper, we expound the connotation of automatic summarization, review the development of automatic text summarization technique and introduce two main techniques in detail: extractive and abstractive summarization, including feature scoring, classification method, linear programming, submodular function, graph ranking, sequence labeling, heuristic algorithm, deep learning, etc. We also analyze the datasets and evaluation metrics that are commonly used in automatic summarization. Finally, the challenges ahead and the future trends of research and application have been predicted.

HTML全文

参考文献(0)

施引文献

资源附件(0)