Abstract:
In a distributed storage system, node failure has become a normal state. In order to ensure high availability of data, the system usually adopts data redundancy. At present, there are mainly two kinds of redundancy mechanisms. One is multiple replications, and the other is erasure coding. With the increasing amount of data, the benefits of the multi-copy mechanism are getting lower and lower, and people are turning their attention to erasure codes with higher storage efficiency. However, the complicated rules of the erasure coding itself cause the overhead of the read, write, and update operations of the distributed storage systems using the erasure coding to be larger than that of the multiple copies. Therefore, erasure coding is usually used for cold data or warm data storage. Hot data, which requires frequent access and update, is still stored in multiple copies. This paper focuses on the data update in erasure-coded storage systems, summarizes the current optimization work related to erasure coding update from the aspects of hard disk I/O, network transmission and system optimization, makes a comparative analysis on the update performance of representative coding schemes at present, and finally looks forward to the future research trends. Through analysis, it is concluded that the current erasure coding update schemes still cannot obtain the update performance similar to that of multiple copies. How to optimize the erasure-coded storage system in the context of erasure coding update rules and system architecture, so that it can replace the multi-copy mechanism under the hot data scenario, and reducing the hot data storage overhead is still a problem worthy of further study in the future.