Abstract:
Relying on the powerful statistical inference theory, statistical language modeling (SLM) has gradually become one of the crucial techniques in lingual information processing. Ponte & Croft first applied SLM to text retrieval in 1998. Since then a large number of studies have concentrated on the language models. In recent years research work carried out by many groups has confirmed that the language modeling approach is a theoretically attractive and potentially very effective probabilistic framework for studying text retrieval problems. In this paper the basic language modeling approaches to text retrieval (SLMTR) are surveyed. The query likelihood model is formally described, and the document language model estimation and data smoothing problems are discussed in detail. Some significant extensions and improvements to the query likelihood model are also presented. Finally, several primary challenges and open issues in SLMTR are summarized.