ISSN 1000-1239 CN 11-1777/TP

• 系统结构 •

### 基于生成矩阵变换的跨数据中心纠删码写入方法

1. 1(并行与分布处理国家重点实验室(国防科技大学) 长沙 410073);2(国防科技大学计算机学院 长沙 410073) (hanb_nudt@foxmail.com)
• 出版日期: 2020-02-01
• 基金资助:
国家重点研发计划项目(2016YFB1000101)；国家自然科学基金项目(61379052)；教育部科研创新基金项目(2018A02002)；湖南省自然科学杰出青年基金项目(14JJ1026)

### A Cross-Datacenter Erasure Code Writing Method Based on Generator Matrix Transformation

Bao Han1,2, Wang Yijie1,2, and Xu Fangliang2

1. 1(National Laboratory for Parallel and Distributed Processing (National University of Defense Technology), Changsha 410073);2(College of Computer, National University of Defense Technology, Changsha 410073)
• Online: 2020-02-01
• Supported by:
This work was supported by the National Key Research and Development Program of China (2016YFB1000101), the National Natural Science Foundation of China (61379052), the Science Foundation of Ministry of Education of China (2018A02002), and the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (14JJ1026).

Abstract: In cross-datacenter storage systems, existing writing methods of erasure code usually has low encoding efficiency, low transmission efficiency, and large network resource consumption. Therefore, cross-datacenters erasure code usually has a low writing rate. This paper proposes a cross-datacenter erasure code writing method based on generator matrix transformation called CREW. Specifically, we first propose a greedy strategy-based transmission topology construction algorithm called GBTC, which can construct a tree-structured transmission topology with incremental weights (the weights are set to the network distances between datacenters) from top to bottom to organize data transmission between datacenters. Then, we propose a generator matrix transformation algorithm called GMT. Without changing the linear relationship of coded blocks, GMT can transform the generator matrix so that the number of data blocks related to a coded block is negatively correlated with the network distance between the datacenter where the coded block is located and the root of the tree-structured topology. Therefore, CREW only needs to transfer a small number of data blocks through a long network distance to write data. Thus, the network resource consumption is reduced. Finally, we propose a distributed pipelined writing algorithm called DPW to distribute encoding operations to different nodes for parallel execution and limit the number of forwards of data blocks, thereby improving encoding efficiency and transmission efficiency. Experiments show that compared with writing methods of traditional erasure code, the write rate of CREW is increased by 36.3%~57.9%. And compared with the existing writing method of cross-datacenter erasure code (IncEncoding), the writing rate of CREW is increased by 32.4%.