ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (2): 262-269.doi: 10.7544/issn1000-1239.2016.20150742

所属专题: 2016数据融合与知识融合专题

• 软件技术 • 上一篇    下一篇

短文本理解研究

王仲远1,2,程健鹏2,3,王海勋4,文继荣1   

  1. 1(中国人民大学信息学院 北京 100872); 2(微软亚洲研究院 北京 100080); 3(牛津大学计算机科学学院 英国牛津 OX1 3QD); 4(Facebook 美国加利福尼亚州门洛帕克市 94025) (zhy.wang@microsoft.com)
  • 出版日期: 2016-02-01
  • 基金资助: 
    国家“九七三”基础研究发展计划基金项目(2014CB340403);中央高校基本科研业务费专项资金(14XNLF05)

Short Text Understanding: A Survey

Wang Zhongyuan1,2, Cheng Jianpeng2,3, Wang Haixun4, Wen Jirong1   

  1. 1(School of Information, Renmin University of China, Beijing 100872); 2(Microsoft Research Asia, Beijing 100080); 2(Department of Computer Science, Oxford University, OXford, UK OX1 3QD); 4(Facebook, Menlo Park, CA, USA 94025)
  • Online: 2016-02-01

摘要: 短文本理解是一项对于机器智能至关重要但又充满挑战的任务.这项任务有益于众多应用场景,如搜索引擎、自动问答、广告和推荐系统.完成这些应用的首要步骤是将输入文本转化为机器可以诠释的形式,即帮助机器“理解”短文本的含义.基于这一目标,许多方法利用外来知识源来解决短文本中语境信息不足的问题.通过总结短文本理解领域的相关工作,介绍了基于向量的短文本理解框架.同时,探讨了短文本理解领域未来的研究方向.

关键词: 知识挖掘, 短文本理解, 概念化, 语义计算

Abstract: Short text understanding is an important but challenging task relevant for machine intelligence. The task can potentially benefit various online applications, such as search engines, automatic question-answering, online advertising and recommendation systems. In all these applications, the necessary first step is to transform an input text into a machine-interpretable representation, namely to “understand” the short text. To achieve this goal, various approaches have been proposed to leverage external knowledge sources as a complement to the inadequate contextual information accompanying short texts. This survey reviews current progress in short text understanding with a focus on the vector based approaches, which aim to derive the vectorial encoding for a short text. We also explore a few potential research topics in the field of short text understanding.

Key words: knowledge mining, short text understanding, conceptualization, semantic computing

中图分类号: