ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (8): 1758-1771.doi: 10.7544/issn1000-1239.2019.20190169

所属专题: 2019人工智能前沿进展专题

• 人工智能 • 上一篇    下一篇

支持范围查询的低冗余知识图谱管理

王飞,钱铁云,刘斌,彭智勇   

  1. (武汉大学计算机学院 武汉 430072) (feiw@whu.edu.cn)
  • 出版日期: 2019-08-01
  • 基金资助: 
    国家重点研发计划项目(2018YFB1003400);国家自然科学基金项目(61572376);中央高校基本科研业务费专项资金项目(2042019k10278);国家自然科学基金联合基金重点项目(U1811263)

Low-Redundancy Knowledge Graph Management with Range Query Support

Wang Fei, Qian Tieyun, Liu Bin, Peng Zhiyong   

  1. (School of Computer Science, Wuhan University, Wuhan 430072)
  • Online: 2019-08-01

摘要: 随着越来越多的数据以知识图谱的形式进行组织和发布,知识图谱的管理引起了大量的关注.现有知识图谱管理方法存在2个明显的缺陷:1)逻辑存储建模产生了大量的数据冗余,无法有效地支持连续属性的范围查询;2)语义存储建模代价昂贵,不能有效地适应查询的动态演化.提出了聚簇对象代理模型(cluster object deputy model, CODM)进行知识和元知识的建模管理.该模型具有2个特点,分别是模式化的逻辑存储建模和轻量级的语义存储建模.CODM设计了基于集合编辑距离的模式聚簇算法将知识图谱转化为模式数据,实现了数据的模式化存储,支持了面向属性数据类型的索引特化.此外,CODM构建类的层次系统建模实体之间的各种语义关联,采用对象指针实现了轻量级的泛化语义关联物化.实验结果证明:CODM不仅能够极大地减少数据冗余和有效地支持范围查询,而且加速了复杂查询的处理效率.

关键词: 知识图谱, 元数据建模, 范围查询, 模式化存储, 聚簇对象代理模型

Abstract: As more and more data is published in the form of knowledge graph, the management of which attracts a lot of attention. Existing approaches for knowledge graph management have two drawbacks: 1) logical storage modeling generates lots of redundancy and ineffectively supports range queries on continuous attributes; 2) semantic storage modeling costs much and inefficiently adapts to the dynamic evolution of knowledge graph. In this paper, we propose a novel method called cluster object deputy model (CODM) to manage knowledge and metadata. The model has two key properties, namely logical storage modeling of schema and semantic storage modeling of lightweight. To this end, we design a schema cluster algorithm based on the set editing distance to convert knowledge graph into schema data, which realizes schema storage of data and supports index specification of attribute type. Besides, CODM constructs a class hierarchical system to model different associations among entities. It adopts object pointers to achieve the lightweight materialization of generalized semantic association. Experimental results show that CODM can tremendously reduce the data redundancy and outperforms the state-of-the-art methods in terms of range queries. And those results also indicate that CODM can accelerate the processing of complex queries.

Key words: knowledge graph, metadata modeling, range query, schema storage, cluster object deputy model (CODM)

中图分类号: