ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (8): 1671-1680.doi: 10.7544/issn1000-1239.2014.20121095

• 系统结构 • 上一篇    下一篇

分布式存储中精确修复最小带宽再生码的性能研究

卫东升,李 钧,王 新   

  1. (智能信息处理上海市重点实验室(复旦大学计算机科学技术学院) 上海 201203) (12210240069@fudan.edu.cn)
  • 出版日期: 2014-08-15
  • 基金资助: 
    基金项目:国家自然科学基金项目(61171074);国家“八六三”高技术研究发展计划基金项目(2009AA01A348);教育部新世纪优秀人才支持计划基金项目(NCET-11-0113)

Performance Study of Exact Minimum Bandwidth Regenerating Codes in Distributed Storage

Wei Dongsheng, Li Jun, Wang Xin   

  1. (Shanghai Key Laboratory of Intelligent Information Processing (School of Computer Science, Fudan University), Shanghai 201203)
  • Online: 2014-08-15

摘要: 分布式存储系统为保证数据可靠性,需要对数据进行冗余存储来应对由于节点失效所带来的数据不可靠性.基于矩阵积构造的精确修复最小带宽再生码除了能够显著降低系统的存储冗余,而且编码的构造参数之间没有约束限制,还能够显著降低修复带宽的开销,具有广阔的应用前景.然而,基于此编码方案所设计的分布式存储系统的性能开销并没有得到充分的研究和分析.针对该编码在分布式存储系统中数据上传、修复、下载3个阶段,分别比较CPU使用率、文件大小、缓冲区大小以及有限域大小对上述3个阶段中运算速度的影响,发现通过对相关参数进行合理配置,可以使得基于相应编码方案的分布式存储系统能够获得良好的运行性能.

关键词: 分布式存储, 再生码, 网络编码, 矩阵积, 性能研究

Abstract: Distributed storage systems need to introduce redundancy to ensure data reliability against node failures. To repair failed nodes, a significant amount of bandwidth is consumed. Regenerating codes are able to achieve the optimal tradeoff between the storage overhead and the repair bandwidth overhead. Based on the current situation that bandwidth resources are more precious than computing resources in distributed storage systems, exact minimum bandwidth regenerating (E-MBR) codes, which can be implemented by a product-matrix construction, enjoy the advantages of regenerating codes as well as systematic codes, and have no restrictions for all construction parameters, making themselves a promising candidate towards the application in distributed storage systems. However, the performance overhead of distributed storage systems based on this coding scheme has not been investigated and analyzed. This paper gives a formal description of coding operations, which can be categorized into three distinct phrases: uploading, downloading and repairing. We hereby analyze the impact of the CPU utilization, the file size, the buffer size and the Galois field size to the computing rates in the three distinct phrases above. We find that distributed storage systems based on E-MBR codes are able to achieve a high computing throughput if we configure the construction parameters of E-MBR codes appropriately.

Key words: distributed storage, regenerating codes, network coding, product-matrix, performance study

中图分类号: