知识库实体对齐技术综述

庄严; 李国良; 冯建华

doi:10.7544/issn1000-1239.2016.20150661

知识库实体对齐技术综述

(清华大学计算机科学与技术系北京 100084) (joyear2008@163.com)

基金项目: 国家自然科学基金优秀青年科学基金项目(61422205)；国家“九七三”重点基础研究发展计划基金项目(2015CB358700)

详细信息

中图分类号: TP311.13; TP182
计量
- 文章访问数: 06325
- HTML全文浏览量: 0
- PDF下载量: 05953
出版历程
- 发布日期: 2015-12-31

A Survey on Entity Alignment of Knowledge Base

(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)

摘要

摘要: 知识库的实体对齐(entity alignment)工作是近年来的研究热点问题.知识库实体对齐的目标是能够高质量链接多个现有知识库，并从顶层创建一个大规模的统一的知识库，从而帮助机器理解底层数据.然而，知识库实体对齐在数据质量、匹配效率等多个方面存在很多问题与挑战有待解决.从这些挑战出发，对十几年来的可用于知识库实体对齐的技术和算法进行综述，通过分类和总结现有技术，为进一步的研究工作提供可选方案.首先形式化定义了知识库实体对齐问题；然后对知识库的实体对齐工作进行总体概述，并从对齐算法、特征匹配技术和分区索引技术3个方面详细总结了各种可用方法和研究进展，重点从局部和全局2个角度对主流的集体对齐算法进行详细阐述，并介绍了常用的评测数据集；最后对未来重点的研究内容和发展方向进行了探讨和展望.
- 知识库 /
- 实体对齐 /
- 相似性传播 /
- 概率模型 /
- 相似性函数 /
- 分区索引
Abstract: Entity alignment on knowledge base has been a hot research topic in recent years. The goal is to link multiple knowledge bases effectively and create a large-scale and unified knowledge base from the top-level to enrich the knowledge base, which can be used to help machines to understand the data and build more intelligent applications. However, there are still many research challenges on data quality and scalability, especially in the background of big data. In this paper, we present a survey on the techniques and algorithms of entity alignment on knowledge base in decade, and expect to provide alternative options for further research by classifying and summarizing the existing methods. Firstly, the entity alignment problem is formally defined. Secondly, the overall architecture is summarized and the research progress is reviewed in detail from algorithms, feature matching and indexing aspects. The entity alignment algorithms are the key points to solve this problem, and can be divided into pair-wise methods and collective methods. The most commonly used collective entity alignment algorithms are discussed in detail from local and global aspects. Some important experimental and real world data sets are introduced as well. Finally, open research issues are discussed and possible future research directions are prospected.
- knowledge base /
- entity alignment /
- similarity propagation /
- probabilistic model /
- similarity function /
- blocking and indexing

HTML全文

参考文献(0)

施引文献(8)

期刊类型引用(2)

1.	张学旺，雷响. 基于层次化群签名的联盟链身份隐私保护方案. 信息安全研究. 2024(12): 1160-1164 . 百度学术
2.	夏莹杰，朱思雨，刘雪娇. 区块链架构下具有条件隐私的车辆编队跨信任域高效群组认证研究. 通信学报. 2023(04): 111-123 . 百度学术