ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (1): 1-21.doi: 10.7544/issn1000-1239.2021.20190785

李金鹏1,2, 张闯1, 陈小军1,胡玥1,2, 廖鹏程1,2   

  1. 1(中国科学院信息工程研究所 北京 100093);2(中国科学院大学网络空间安全学院 北京 100040) (
  • 出版日期: 2021-01-01
Survey on Automatic Text Summarization

Li Jinpeng1,2, Zhang Chuang1, Chen Xiaojun1, Hu Yue1,2, Liao Pengcheng1,2   

  1. 1(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093);2(School of Cyber Security, University of Chinese Academy of Sciences, Beijing 100040)
  • Online: 2021-01-01
    This work was supported by the National Natural Science Foundation of China (61602474).

摘要: 近年来,互联网技术的蓬勃发展极大地便利了人类的日常生活,不可避免的是互联网中的信息呈井喷式爆发,如何从中快速有效地获取所需信息显得极为重要.自动文本摘要技术的出现可以有效缓解该问题,其作为自然语言处理和人工智能领域的重要研究内容之一,利用计算机自动地从长文本或文本集合中提炼出一段能准确反映源文中心内容的简洁连贯的短文.探讨自动文本摘要任务的内涵,回顾和分析了自动文本摘要技术的发展,针对目前主要的2种摘要产生形式(抽取式和生成式)的具体工作进行了详细介绍,包括特征评分、分类算法、线性规划、次模函数、图排序、序列标注、启发式算法、深度学习等算法.并对自动文本摘要常用的数据集以及评价指标进行了分析,最后对其面临的挑战和未来的研究趋势、应用等进行了预测.

关键词: 自动文本摘要, 抽取式方法, 生成式方法, 深度学习, ROUGE指标

Abstract: In recent years, the rapid development of Internet technology has greatly facilitated the daily life of human, and it is inevitable that massive information erupts in a blowout. How to quickly and effectively obtain the required information on the Internet is an urgent problem. The automatic text summarization technology can effectively alleviate this problem. As one of the most important fields in natural language processing and artificial intelligence, it can automatically produce a concise and coherent summary from a long text or text set through computer, in which the summary should accurately reflect the central themes of source text. In this paper, we expound the connotation of automatic summarization, review the development of automatic text summarization technique and introduce two main techniques in detail: extractive and abstractive summarization, including feature scoring, classification method, linear programming, submodular function, graph ranking, sequence labeling, heuristic algorithm, deep learning, etc. We also analyze the datasets and evaluation metrics that are commonly used in automatic summarization. Finally, the challenges ahead and the future trends of research and application have been predicted.

Key words: automatic text summarization, extractive, abstractive, deep learning, ROUGE metric