ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (6): 1278-1287.doi: 10.7544/issn1000-1239.2015.20150139

Special Issue: 2015面向应用领域需求的体系结构

Previous Articles     Next Articles

A Data Deduplication-Based Primary Storage System in Cloud-of-Clouds

Mao Bo1, Ye Geyan1, Lan Yanjia1, Zhang Yangsong1, Wu Suzhen2,3   

  1. 1(Software School, Xiamen University, Xiamen, Fujian 361005);2(Computer Science Department, School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005);3(State Key Laboratory of High-End Server & Storage Technology(Shandong Institute of Massive Information Technology), Jinan 250101)
  • Online:2015-06-01

Abstract: With the rapid development of cloud storage technology, more and more companies are beginning to upload data to the cloud storage platform. However, solely depending on the particular cloud storage provider has a number of potentially serious problems, such as vendor lock-in, availability, and security issues. To address the problems, we propose a deduplication-based primary storage system in cloud-of-clouds in this paper by eliminating the redundant data block in the cloud computing environment and distributing the data among multiple independent cloud storage providers. The data is stored in multiple cloud storage providers by combining the replication and erasure code schemes. The replication way is easy to implement and deploy but has high storage overhead. The storage overhead of erasure code is small, but it requires computational overhead for encode and decode operations. To better utilize the advantages of both replication and erasure code schemes and to exploit the reference characteristics in data deduplication, the high referenced data blocks are stored with replication scheme and the other data blocks are stored with erasure code scheme. The experiments conducted on our lightweight prototype implementation of new system show that the deduplication-based primary storage system in cloud-of-clouds improves the performance and cost efficiency significantly than the existing schemes.

Key words: cloud-of-clouds, data deduplication, data layout, replication, erasure code

CLC Number: