ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (11): 2419-2431.doi: 10.7544/issn1000-1239.2020.20190675

Previous Articles     Next Articles

Survey on Data Updating in Erasure-Coded Storage Systems

Zhang Yao, Chu Jiajia, Weng Chuliang   

  1. (School of Data Science and Engineering, East China Normal University, Shanghai 200062)
  • Online:2020-11-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61772204, 61732014).

Abstract: In a distributed storage system, node failure has become a normal state. In order to ensure high availability of data, the system usually adopts data redundancy. At present, there are mainly two kinds of redundancy mechanisms. One is multiple replications, and the other is erasure coding. With the increasing amount of data, the benefits of the multi-copy mechanism are getting lower and lower, and people are turning their attention to erasure codes with higher storage efficiency. However, the complicated rules of the erasure coding itself cause the overhead of the read, write, and update operations of the distributed storage systems using the erasure coding to be larger than that of the multiple copies. Therefore, erasure coding is usually used for cold data or warm data storage. Hot data, which requires frequent access and update, is still stored in multiple copies. This paper focuses on the data update in erasure-coded storage systems, summarizes the current optimization work related to erasure coding update from the aspects of hard disk I/O, network transmission and system optimization, makes a comparative analysis on the update performance of representative coding schemes at present, and finally looks forward to the future research trends. Through analysis, it is concluded that the current erasure coding update schemes still cannot obtain the update performance similar to that of multiple copies. How to optimize the erasure-coded storage system in the context of erasure coding update rules and system architecture, so that it can replace the multi-copy mechanism under the hot data scenario, and reducing the hot data storage overhead is still a problem worthy of further study in the future.

Key words: erasure codes, distributed storage systems, data update, multiple copies, storage overhead

CLC Number: