ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2006, Vol. 43 ›› Issue (3): 489-495.

• • 上一篇    下一篇

话题识别与跟踪中的层次化话题识别技术研究

于满泉1,2 骆卫华1 许洪波1 白 硕1   

  1. 1(中国科学院计算技术研究所软件研究室 北京 100080) 2(中国科学院研究生院 北京 100049) (yumanquan@software.ict.ac.cn)
  • 出版日期: 2006-03-15

Research on Hierarchical Topic Detection in Topic Detection and Tracking

Yu Manquan1,2, Luo Weihua1, Xu Hongbo1, and Bai Shuo1   

  1. 1(Software Laboratory, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100080) 2(Graduate University, Chinese Academy of Sciences, Beijing 100049)
  • Online: 2006-03-15

摘要: 话题识别与跟踪(topic detection and tracking,TDT)旨在发展一系列基于事件的信息组织技术,层次化话题识别(hierarchical topic detection,HTD)是其中一项全新的任务定义形式.通过连续的大规模评测,话题识别与跟踪已成为国际上自然语言处理尤其是信息检索领域的一个研究热点.为此,将自然语言处理与信息检索技术相结合,提出了针对事件特点的切实有效的单粒度话题识别方法,并提出了基于多层聚类的MLCS算法对话题进行层次化组织.所提出的方法具有很好的效果,在TDT2004的HTD评测中,该方法取得了第2名的成绩.

关键词: 话题识别与跟踪, 层次化话题识别, 多层聚类, 命名实体, 指代消解

Abstract: Topic detection and tracking (TDT) aims to develop a series of technologies for event based information organization, and hierarchical topic detection (HTD) is a new task of it. Through a series of large-scale evaluations, TDT has become a hot problem for worldwide research in the fields of natural language processing, especially in information retrieval. In this paper, an effective method of topic detection focusing on the features of events is proposed, and an arithmetic named MLCS is also offered to organize topics into hierarchical structures. The methods proposed are very effective, and score second in the HTD evaluation of TDT2004.

Key words: topic detection and tracking, hierarchical topic detection, multi-layered clustering, named entity, reference resolution