ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (6): 1278-1287.doi: 10.7544/issn1000-1239.2015.20150139

所属专题: 2015面向应用领域需求的体系结构

• 系统结构 • 上一篇    下一篇

一种基于重复数据删除技术的云中云存储系统

毛波1,叶阁焰1,蓝琰佳1,张杨松1,吴素贞2,3   

  1. 1(厦门大学软件学院 福建厦门 361005);2(厦门大学信息科学与技术学院计算机科学系 福建厦门 361005);3(高效能服务器和存储技术国家重点实验室(山东海量信息技术研究院) 济南 250101) (maobo@xmu.edu.cn)
  • 出版日期: 2015-06-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61472336,61402385);国家科技支撑计划基金项目(2015BAH16F02);教育部留学回国人员科研启动基金;高效能服务器和存储技术国家重点实验室开放课题(2014HSSA04);中央高校基本科研业务费专项资金项目(20720140515)

A Data Deduplication-Based Primary Storage System in Cloud-of-Clouds

Mao Bo1, Ye Geyan1, Lan Yanjia1, Zhang Yangsong1, Wu Suzhen2,3   

  1. 1(Software School, Xiamen University, Xiamen, Fujian 361005);2(Computer Science Department, School of Information Science and Technology, Xiamen University, Xiamen, Fujian 361005);3(State Key Laboratory of High-End Server & Storage Technology(Shandong Institute of Massive Information Technology), Jinan 250101)
  • Online: 2015-06-01

摘要: 随着云存储技术的快速发展和应用,越来越多的企业和用户都开始将数据从本地转移到云存储服务提供商进行存储.但是,在享受云存储高质量服务的同时,将数据仅仅存储于单个云存储服务商中会带来一定的风险,例如云存储服务提供商的垄断、数据可用性和安全性等问题.为了解决这个问题,提出了一种基于重复数据删除技术的云中云存储系统架构,首先消除云存储系统中的冗余数据量,然后基于重复数据删除集中的数据块引用率将数据块以复制和纠删码2种数据布局方式存储在多个云存储服务提供商中.基于复制的数据布局方式易于实现部署,但是存储开销大;基于纠删码的数据布局方式存储开销小,但是需要编码和解码,计算开销大.为了充分挖掘复制和纠删码数据布局的优点并结合重复数据删除技术中数据引用的特点,新方法用复制方式存储高引用数据块,用纠删码方式存储其他数据块,从而使系统整体性能和成本达到较优.通过原型系统的实现和测试验证了相比现有云中云存储策略,新方法在性能和成本上都有大幅度提高.

关键词: 云中云, 重复数据删除, 数据布局, 复制, 纠删码

Abstract: With the rapid development of cloud storage technology, more and more companies are beginning to upload data to the cloud storage platform. However, solely depending on the particular cloud storage provider has a number of potentially serious problems, such as vendor lock-in, availability, and security issues. To address the problems, we propose a deduplication-based primary storage system in cloud-of-clouds in this paper by eliminating the redundant data block in the cloud computing environment and distributing the data among multiple independent cloud storage providers. The data is stored in multiple cloud storage providers by combining the replication and erasure code schemes. The replication way is easy to implement and deploy but has high storage overhead. The storage overhead of erasure code is small, but it requires computational overhead for encode and decode operations. To better utilize the advantages of both replication and erasure code schemes and to exploit the reference characteristics in data deduplication, the high referenced data blocks are stored with replication scheme and the other data blocks are stored with erasure code scheme. The experiments conducted on our lightweight prototype implementation of new system show that the deduplication-based primary storage system in cloud-of-clouds improves the performance and cost efficiency significantly than the existing schemes.

Key words: cloud-of-clouds, data deduplication, data layout, replication, erasure code

中图分类号: