Abstract:
With the rapid development of cloud storage technology, more and more companies are beginning to upload data to the cloud storage platform. However, solely depending on the particular cloud storage provider has a number of potentially serious problems, such as vendor lock-in, availability, and security issues. To address the problems, we propose a deduplication-based primary storage system in cloud-of-clouds in this paper by eliminating the redundant data block in the cloud computing environment and distributing the data among multiple independent cloud storage providers. The data is stored in multiple cloud storage providers by combining the replication and erasure code schemes. The replication way is easy to implement and deploy but has high storage overhead. The storage overhead of erasure code is small, but it requires computational overhead for encode and decode operations. To better utilize the advantages of both replication and erasure code schemes and to exploit the reference characteristics in data deduplication, the high referenced data blocks are stored with replication scheme and the other data blocks are stored with erasure code scheme. The experiments conducted on our lightweight prototype implementation of new system show that the deduplication-based primary storage system in cloud-of-clouds improves the performance and cost efficiency significantly than the existing schemes.