ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (6): 1213-1224.doi: 10.7544/issn1000-1239.2017.20160908

所属专题: 2017优青专题

• 综述 • 上一篇    下一篇

分布式RDF数据管理综述

邹磊1,彭鹏2   

  1. 1(北京大学计算机科学技术研究所 北京 100080); 2(湖南大学信息科学与工程学院 长沙 410082) (zoulei@pku.edu.cn)
  • 出版日期: 2017-06-01
  • 基金资助: 
    国家自然科学基金优秀青年科学基金项目(61622201)

A Survey of Distributed RDF Data Management

Zou Lei1, Peng Peng2   

  1. 1(Institute of Computer Science & Technology, Peking University, Beijing 100080); 2(College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082)
  • Online: 2017-06-01

摘要: 资源描述框架(resource description framework, RDF)作为一个展示、共享和连接网络上的数据的模型,已经被广泛地用在各种应用中.同时,SPARQL(simple protocol and RDF query language)作为一种结构化查询语言则被用来支持对RDF数据进行查询检索.随着RDF数据规模的日益增长,在现有RDF数据库上进行SPARQL查询处理已经超出了单机的处理能力.于是,人们需要设计出高性能的分布式RDF数据库以支持对SPARQL查询进行高效的处理.当前,已经有大量的工作来讨论如何搭建分布式RDF数据管理系统.对这些不同的分布式RDF数据管理方法进行综述,将现有的分布式RDF数据管理方法分成3类:基于云计算平台的分布式RDF数据管理方法、基于数据划分的分布式RDF数据管理方法和联邦式系统.基于云计算平台的分布式RDF数据管理方法利用已有云平台进行RDF数据的管理;基于数据划分的分布式RDF数据管理方法首先将RDF数据图划分成若干子图,然后将这些子图分配到不同计算节点上;联邦式系统的特点是数据已经分布在不同节点上,数据管理系统无法控制数据的分布.在每类分布式RDF数据管理方法的介绍中,将深入讨论以帮助读者了解各种方法的特点.

关键词: RDF数据管理, SPARQL查询处理, 分布式数据库系统, 云计算, 关联数据

Abstract: Recently, RDF (resource description framework) has been widely used to expose, share, and connect pieces of data on the Web, while SPARQL (simple protocol and RDF query language) is a structured query language to access RDF repository. As RDF datasets increase in size, evaluating SPARQL queries over current RDF repositories is beyond the capacity of a single machine. As a result, a high performance distributed RDF database system is needed to efficiently evaluate SPARQL queries. There are a huge number of works for distributed RDF data management following different approaches. In this paper we provide an overview of these works. This survey considers three kinds of distributed data management approaches, including cloud-based distributed data management approaches, partitioning-based distributed data management approaches and federated RDF systems. Simply speaking, cloud-based distributed data management approaches use existing cloud platforms to manage large RDF datasets; partitioning-based distributed data management approaches divide an RDF graph into several fragments and place each fragment at a different site in a distributed system; and federated RDF systems disallow for re-partitioning the data, since the data has been distributed over their own autonomous sites. In each kind of distributed data management approaches, further discussions are also provided to help readers to understand the characteristics of different approaches.

Key words: RDF data management, SPARQL query processing, distributed database system, cloud computing, linked data

中图分类号: