Survey on Single Disk Failure Recovery Methods for Erasure Coded Storage Systems

Fu Yingxun1, Wen Shilin1, Ma Li1, Shu Jiwu2   

  1. 1(College of Computer Science, North China University of Technology, Beijing 100144);2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
  • Online:2018-01-01

Abstract: With the rapid development of cloud storage, erasure codes which can tolerate a series of disk failures with low storage overhead have attracted a lot of attentions. The implementations for erasure codes constructing over storage systems are erasure coded storage systems. Once disk failures happen, erasure coded storage systems need to access the information storing on the surviving disks, and then reconstruct the lost information by a certain recovery algorithm. With the development of storage scale, disk failures happen very frequently, where most of disk failures are single disk failure. Therefore, how to fast recover the lost data from single disk failures has becoming a key problem for erasure coded storage systems. In this paper, we first introduce the background and significance for single disk failure recoveries, and then give some fundamental terms and principles for erasure codes. Afterward, we illustrate the hybrid recovery principle, elaborate the key ideas for current construction-based recovery methods and search-based recovery methods in detail, and summarize their typical application scenarios. We also summarize some new erasure coding techniques for optimizing the single disk failure recovery efficiency. At the end of the paper, we discuss the research directions for disk failure recoveries under erasure coded storage systems in the future.

Key words: storage system, erasure code, reliability, disk failure, data recovery method

