基于并发跳表的云数据处理双层索引架构研究

周维; 路劲; 周可人; 王世普; 姚绍文

doi:10.7544/issn1000-1239.2015.20140358

基于并发跳表的云数据处理双层索引架构研究

(云南大学软件学院昆明 650091) (zwei@ynu.edu.cn)

基金项目: 国家自然科学基金项目 (61363021,61363084)；云南省软件工程重点实验室开放基金项目(2011SE01,2012SE304)；云南省青年基金项目(2012FD004)；云南省教育厅科学研究基金项目(2014Y013)

详细信息

中图分类号: TP393
计量
- 文章访问数: 1348
- HTML全文浏览量: 0
- PDF下载量: 656
出版历程
- 发布日期: 2015-06-30

Concurrent Skiplist Based Double-Layer Index Framework for Cloud Data Processing

(School of Software, Yunnan University, Kunming 650091)

摘要

摘要: 云数据处理在云计算基础设施中占有极其关键的地位.然而,当前的云存储系统绝大部分都采用基于分布式Hash的健-值对模式来组织数据,在范围查询方面支持不理想、且动态实时性差,有必要构建云环境下辅助动态索引.通过总结、分析云环境中辅助双层索引机制,提出一种基于并发跳表的云数据处理双层索引架构.该架构采用两层体系结构,突破单台机器内存和硬盘的限制,从而扩展系统整体的索引范围.通过动态分裂算法解决局部服务器中的热点问题,保证索引结构整体的负载均衡.通过并发跳表来提高全局索引的承载性能,改善了全局索引的并发性,提高整体索引的吞吐率.实验结果表明,基于并发跳表的云数据处理双层索引架构能够有效支持单键查询和范围查询,具有较强的可扩展性和并发性,是一种高效的云存储辅助索引.
- 云计算 /
- 双层索引 /
- 并发跳表 /
- 范围查询 /
- 乐观并发控制
Abstract: Cloud data processing plays an essential infrastructure in cloud systems. Without efficient structures, cloud systems cannot support the necessary high throughput and provide services for millions of users. However, most existing cloud storage systems generally adopt a distributed Hash table (DHT) approach to index data, which lacks to support range-query and dynamic real-time character. It is necessary to generate a scalable, dynamical and multi-query functional index structure in cloud environment. Based on the summary and analysis of the double-layer index systems for cloud storage, this paper provides a novel concurrent skiplist based double-layer index (referred as CSD-index) for cloud data processing. Two-layer architecture, which can breakthrough single machine memory and hard drive limitation, is used to extend indexing scope. Online migration algorithm of skiplist’s nodes between local servers is used to make dynamic load-balancing. The details of the design and the implement of the concurrent skiplist are discussed in this paper. Optimistic concurrency control (OCC) technique is introduced to enhance the concurrency. Through concurrent skiplist CSD-index improves the load bearing capacity of the global index and enhances the overall throughput of the index. Experimental results show the efficiency of the concurrent skiplist based double-layer index and it has viability as an alternative approach for cloud-suitable data structures.
- cloud computing /
- double-layer index /
- concurrent skiplist /
- range query /
- optimistic concurrency control