高可扩展的RDF数据存储系统
A Highly Scalable RDF Data Storage System
-
摘要: 由于资源描述框架(resource description framework, RDF)具有表达灵活、简洁等优点,已被接受为表达元数据及万维网上数据互联的规范.近年来,其数据量在以飞快的速度增长.相应地,要求存储RDF数据的系统应具有高扩展性.介绍了一个高可扩展的RDF数据存储系统TripleBit.为尽可能降低存储空间消耗,采用了增量压缩和变长整数编码方法.并采用了数据分块的存储方法,既使得存储管理方便又使得存储结构紧凑,加速了数据读取.系统提供了基于启发式规则的动态查询计划生成方法,所产生的查询计划在执行过程中根据中间结果会相应作调整,以保持最优的执行顺序.对于多变量的查询,使用二步执行策略以减少查询过程中产生的中间结果.与目前流行RDF数据存储系统相比较,在存储空间上RDF-3X比TripleBit至少多40%;在查询性能上,比RDF-3X和MonetDB获得数倍的提升.Abstract: As RDF(Resource Description Framework) is flexible to express and easy to interchange, the volume of RDF data is increasing rapidly. TripleBit aims to propose an efficient approach in data storage and query processing for large scale RDF data in several aspects. TripleBit employs delta compression and variable integer encoding schemes in order to reduce the storage space. The data tables are partitioned into several chunks, which not only facilitate the buffer management but also make the data more compact, therefore it can accelerate the query processing. We employ heuristic rules to generate query plan dynamically. Besides, two-stage execution strategy is used in multiple-variable query which can reduce the intermediate result. The performance evaluation is compared with the state of art RDF stores, such as RDF-3X, MonetDB. Experimental results demonstrate that TripleBit saves at least 40% storage space while the speed of query processing has been improved very much.