ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (6): 1213-1224.doi: 10.7544/issn1000-1239.2017.20160908

Special Issue: 2017优青专题

Previous Articles     Next Articles

A Survey of Distributed RDF Data Management

Zou Lei1, Peng Peng2   

  1. 1(Institute of Computer Science & Technology, Peking University, Beijing 100080); 2(College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082)
  • Online:2017-06-01

Abstract: Recently, RDF (resource description framework) has been widely used to expose, share, and connect pieces of data on the Web, while SPARQL (simple protocol and RDF query language) is a structured query language to access RDF repository. As RDF datasets increase in size, evaluating SPARQL queries over current RDF repositories is beyond the capacity of a single machine. As a result, a high performance distributed RDF database system is needed to efficiently evaluate SPARQL queries. There are a huge number of works for distributed RDF data management following different approaches. In this paper we provide an overview of these works. This survey considers three kinds of distributed data management approaches, including cloud-based distributed data management approaches, partitioning-based distributed data management approaches and federated RDF systems. Simply speaking, cloud-based distributed data management approaches use existing cloud platforms to manage large RDF datasets; partitioning-based distributed data management approaches divide an RDF graph into several fragments and place each fragment at a different site in a distributed system; and federated RDF systems disallow for re-partitioning the data, since the data has been distributed over their own autonomous sites. In each kind of distributed data management approaches, further discussions are also provided to help readers to understand the characteristics of different approaches.

Key words: RDF data management, SPARQL query processing, distributed database system, cloud computing, linked data

CLC Number: