ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (8): 1696-1708.doi: 10.7544/issn1000-1239.2016.20160192

所属专题: 2016数据挖掘前沿技术专题

• 人工智能 • 上一篇    下一篇

基于语义一致性的集成实体链接算法

刘峤,钟云,刘瑶,吴祖峰,秦志光   

  1. (电子科技大学信息与软件工程学院 成都 610054) (qliu@uestc.edu.cn)
  • 出版日期: 2016-08-01
  • 基金资助: 
    国家自然科学基金项目(61133016,61272527,61202445);国家自然科学基金青年项目(61502087);中央高校基本科研业务费专项资金项目(ZYGX2014J066)

Consistent Collective Entity Linking Algorithm

Liu Qiao, Zhong Yun, Liu Yao, Wu Zufeng,Qin Zhiguang   

  1. (School of Information and Software Engineering, University of Electronic Science and Technology of China, Chengdu 610054)
  • Online: 2016-08-01

摘要: 实体链接任务的目标是将从文本中抽取得到的实体指称项正确地链接到知识库中的对应实体对象上.当前主流的实体链接算法大致可分为2类:基于上下文相似度的实体链接算法和基于图的集成实体链接算法.这2类算法各自存在一些优点和不足.前者有利于从上下文语义的角度对实体进行区分,但难以充分利用知识库中已有的知识体系辅助决策;后者能够更好地利用知识库中实体间的语义关联关系,但在上下文信息不充分的情况下,较难区分概念相近的实体.提出一种基于语义一致性的集成实体链接算法,该算法能够更好地利用知识库中实体间的结构化语义关系,帮助提高算法对概念相似实体的区分度,实验结果表明:该算法能够有效提高实体链接结果的准确率和召回率,性能显著优于当前的主流算法,在对长、短文本的实体链接任务中性能表现稳定,具有良好的适应性和可推广性.

关键词: 集成实体链接, 信息抽取, 知识库扩容, 个性化PageRank, 语义相关性

Abstract: The goal of entity linking is to link entity mentions in the document to their corresponding entity in a knowledge base. The prevalent approaches can be divided into two categories: the similarity-based approaches and the graph-based collective approaches. Each of them has some pros and cons. The similarity-based approaches are good at distinguish entities from the semantic perspective, but usually suffer from the disadvantage of ignoring relationship between entities; while the graph-based approaches can make better use of the relation between entities, but usually suffer from bad discrimination on similar entities. In this work, we present a consistent collective entity linking algorithm that can take full advantage of the structured relationship between entities contained in the knowledge base, to improve the discrimination capability of the proposed algorithm on similar entities. We extensively evaluate the performance of our method on two public datasets, and the experimental results show that our method can be effective at promoting the precision and recall of the entity linking results. The overall performance of the proposed algorithm significantly outperform other state-of-the-art algorithms.

Key words: collective entity linking, information retrieval, knowledge base population, personalized PageRank, semantic correlation

中图分类号: