ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

基于主题语言模型的句子检索算法

吴友政 赵 军 徐 波   

  1. (中国科学院自动化研究所模式识别国家重点实验室 北京 100080) (jzhao@nlpr.ia.ac.cn)
  • 出版日期: 2007-02-15

Sentence Retrieval with a Topic-Based Language Model

Wu Youzheng, Zhao Jun and Xu Bo   

  1. (National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing 100080)
  • Online: 2007-02-15

摘要: 提出了基于主题语言模型的汉语问答系统句子检索算法,该算法利用问答系统中特有的提问分类信息(即提问的答案语义信息)对句子初检结果进行主题聚类,通过Aspect Model将句子所属的主题信息引入到语言模型中,从而获得对句子语言模型更精确的描述.对于初检结果的聚类,提出了“一个句子多个主题”和“一个句子一个主题”两种算法.相对于PLSI算法的主题空间维度,提出的主题空间具有更加明确的物理意义;由于不需要迭代运算,运行速度更具优势.对比实验的结果表明,与标准语言模型方法相比,基于主题语言模型的方法可以明显地提高汉语问答系统句子检索模块的性能.

关键词: 汉语问答系统, 语言模型, 句子检索

Abstract: A novel topic-based language model for sentence retrieval in Chinese question answering is presented in this paper. The main idea is to make use of the peculiar characteristics in question answering scenario, that is, the semantic category of the expected answer, to conduct topic segmentation, and then incorporate the topic information of the sentence into the standard language model. For the topic segmentation, two approaches are presented, that is, one-sentence-one-topic and one-sentence-multi-topics. The experimental results show that the performance of sentence retrieval based on the proposed topic-based language model is improved significantly.

Key words: Chinese question answering, language model, sentence retrieval