文本检索的统计语言建模方法综述

丁国栋  白  硕  王  斌

文本检索的统计语言建模方法综述

丁国栋白硕王斌

A Survey of Statistical Language Modeling for Text Retrieval

Ding Guodong, Bai Shuo, and Wang Bin

摘要

摘要: 统计语言建模技术(statistical language modeling, SLM)已逐渐成为当前语言信息处理的主流技术之一.近几年的研究和实验表明，SLM技术在文本检索领域有着广阔的发展前景和拓展空间.对基于SLM的文本检索方法(SLMTR)进行了综述，重点论述SLMTR的主要方法和关键技术.首先对查询似然检索模型进行形式化的描述；然后详细论述语言模型的估计和数据平滑问题；并讨论了平滑对检索性能的影响；之后简要介绍了对查询似然模型的一些主要的扩展和改进工作；最后的总结部分讨论了SLMTR所面临的一些挑战.

Abstract: Relying on the powerful statistical inference theory, statistical language modeling (SLM) has gradually become one of the crucial techniques in lingual information processing. Ponte & Croft first applied SLM to text retrieval in 1998. Since then a large number of studies have concentrated on the language models. In recent years research work carried out by many groups has confirmed that the language modeling approach is a theoretically attractive and potentially very effective probabilistic framework for studying text retrieval problems. In this paper the basic language modeling approaches to text retrieval (SLMTR) are surveyed. The query likelihood model is formally described, and the document language model estimation and data smoothing problems are discussed in detail. Some significant extensions and improvements to the query likelihood model are also presented. Finally, several primary challenges and open issues in SLMTR are summarized.

HTML全文

参考文献(0)

施引文献

资源附件(0)