ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (9): 1988-2000.doi: 10.7544/issn1000-1239.2019.20190048

Previous Articles     Next Articles

Proactive Locally Repairable Codes for Cloud Storage Systems

Zhang Xiaoyang1, Xu Jiahao1, Hu Yuchong1,2   

  1. 1(School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074); 2(Shenzhen Huazhong University of Science and Technology Research Institute, Shenzhen, Guangdong 518000)
  • Online:2019-09-10
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61872414, 61502191) and Shenzhen Knowledge Innovation Program (JCYJ20170307172447622).

Abstract: Cloud storage systems, which provide customers the ability to access their data reliably, start to adopt a novel family of codes called locally reparable codes (LRC), e.g., Windows Azure Storage and Facebook’ HDFS RAID. Compared with Reed-Solomon codes, LRC is efficiently repairable since it divides the data blocks of each stripe into groups, each of which has an additional local parity block such that a failed block can be repaired locally in one group. LRC assumes that each group is equal-size which implies that each failed block is repaired from the same amount of data of a group. However, the blocks in the disks which are more likely to fail should be repaired more efficiently. In this paper, we present a proactive LRC (pLRC) via predicting disk failures and resizing the groups such that the recent failed disks can be repaired faster while maintaining the same storage overhead and code construction relative to LRC. We analyze pLRC through the reliability modeling of mean-time-to-data-loss (MTTDL) and also implement pLRC in Facebook’s HDFS. The results show that compared with LRC, pLRC’s reliability can be improved by up to 113%, and its degraded read and disk repair performance can be improved by up to 46.8% and 47.5%, respectively.

Key words: cloud storage, locally repairable codes (LRC), disk failures, machine learning, decision tree

CLC Number: