ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (2): 293-304.doi: 10.7544/issn1000-1239.2021.20200340

所属专题: 2021大数据时代的存储系统与智能存储技术专题

• 系统结构 • 上一篇    下一篇

基于多级网络编码的多副本云数据存储

徐光伟,史春红,冯向阳,罗辛,石秀金,韩松桦,李玮   

  1. (东华大学计算机科学与技术学院 上海 201620) (gwxu@dhu.edu.cn)
  • 出版日期: 2021-02-01
  • 基金资助: 
    国家自然科学基金项目(61772018, 61772128);上海市自然科学基金项目(19ZR1402000,17ZR1400200);上海市教育科研项目(C160076)

Multi-Replica Cloud Data Storage Based on Hierarchical Network Coding

Xu Guangwei, Shi Chunhong, Feng Xiangyang, Luo Xin, Shi Xiujin, Han Songhua, Li Wei   

  1. (School of Computer Science and Technology, Donghua University, Shanghai 201620)
  • Online: 2021-02-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61772018, 61772128), the Natural Science Foundation of Shanghai (19ZR1402000, 17ZR1400200), and the Shanghai Education and Scientific Research Project (C160076).

摘要: 云数据存储的快速发展对数据的可用性提出了较高要求.目前,主要采用纠删码计算数据编码块进行分布式冗余数据存储来保证数据的可用性.虽然这种数据编码技术保证了存储数据的安全性并减少了额外的存储空间,但在损坏数据恢复时会产生较大的计算和通信开销.提出一种基于多级网络编码的多副本生成和损坏数据恢复算法.算法基于多级网络编码对纠删码的编码矩阵进行改进形成多级编码矩阵,利用其级联性生成多级编码(hierarchical coding, HC码)来构成多副本数据,使得各副本之间存在编码关系.在损坏数据恢复时,利用数据所有者提供的数据编码信息和云存储中保存的数据块直接计算进行恢复,从而避免从云存储中远程下载数据.理论分析和实验表明,所提算法在相同的存储空间下显著减少了损坏数据恢复时的通信开销并提高了数据的可用性.

关键词: 云存储, 多副本, 多级网络编码, 多级编码矩阵, 数据恢复

Abstract: The rapid development of cloud data storage presents a high demand on the availability of stored data. Currently, the main technique of ensuring data availability is to use erasure coding to calculate coded blocks for the stored data, and then utilize distributed storage to store multiple redundant coded blocks in cloud storage space. Although this data coding technology can ensure the security of stored data and reduce extra storage space, it also causes a large calculation and communication overhead when recovering corrupted data. In this paper a multi-replica generation and corrupted data recovery algorithm is proposed based on hierarchical network coding. The algorithm improves the coding matrix of erasure coding based on hierarchical network coding to form the hierarchical coding (HC). Then multi-replicas which are built based on the cascade of the hierarchical coding forms the coding relationship between each other. In the process of corrupted data recovery, the data encoding information provided by the data owner and the complete data blocks stored by the cloud server are jointly computed to recover the corrupted data blocks, avoiding remote data downloading from the cloud storage space. Theoretical analysis and simulation experiments indicate that the proposed algorithm reduces the communication overhead significantly when recovering corrupted data and improves the availability of stored data under the same storage space.

Key words: cloud storage, multiple replica, hierarchical network coding, hierarchical coding matrix, data recovery

中图分类号: