ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (1): 165-192.doi: 10.7544/issn1000-1239.2016.20150661

所属专题: 2016优青专题

• 软件技术 • 上一篇    下一篇

知识库实体对齐技术综述

庄严,李国良,冯建华   

  1. (清华大学计算机科学与技术系 北京 100084) (joyear2008@163.com)
  • 出版日期: 2016-01-01
  • 基金资助: 
    国家自然科学基金优秀青年科学基金项目(61422205);国家“九七三”重点基础研究发展计划基金项目(2015CB358700)

A Survey on Entity Alignment of Knowledge Base

Zhuang Yan, Li Guoliang, Feng Jianhua   

  1. (Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
  • Online: 2016-01-01

摘要: 知识库的实体对齐(entity alignment)工作是近年来的研究热点问题.知识库实体对齐的目标是能够高质量链接多个现有知识库,并从顶层创建一个大规模的统一的知识库,从而帮助机器理解底层数据.然而,知识库实体对齐在数据质量、匹配效率等多个方面存在很多问题与挑战有待解决.从这些挑战出发,对十几年来的可用于知识库实体对齐的技术和算法进行综述,通过分类和总结现有技术,为进一步的研究工作提供可选方案.首先形式化定义了知识库实体对齐问题;然后对知识库的实体对齐工作进行总体概述,并从对齐算法、特征匹配技术和分区索引技术3个方面详细总结了各种可用方法和研究进展,重点从局部和全局2个角度对主流的集体对齐算法进行详细阐述,并介绍了常用的评测数据集;最后对未来重点的研究内容和发展方向进行了探讨和展望.

关键词: 知识库, 实体对齐, 相似性传播, 概率模型, 相似性函数, 分区索引

Abstract: Entity alignment on knowledge base has been a hot research topic in recent years. The goal is to link multiple knowledge bases effectively and create a large-scale and unified knowledge base from the top-level to enrich the knowledge base, which can be used to help machines to understand the data and build more intelligent applications. However, there are still many research challenges on data quality and scalability, especially in the background of big data. In this paper, we present a survey on the techniques and algorithms of entity alignment on knowledge base in decade, and expect to provide alternative options for further research by classifying and summarizing the existing methods. Firstly, the entity alignment problem is formally defined. Secondly, the overall architecture is summarized and the research progress is reviewed in detail from algorithms, feature matching and indexing aspects. The entity alignment algorithms are the key points to solve this problem, and can be divided into pair-wise methods and collective methods. The most commonly used collective entity alignment algorithms are discussed in detail from local and global aspects. Some important experimental and real world data sets are introduced as well. Finally, open research issues are discussed and possible future research directions are prospected.

Key words: knowledge base, entity alignment, similarity propagation, probabilistic model, similarity function, blocking and indexing

中图分类号: