ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2018, Vol. 55 ›› Issue (1): 1-13.doi: 10.7544/issn1000-1239.2018.20160506

• 综述 •    下一篇

纠删码存储系统单磁盘错误重构优化方法综述

傅颖勋1,文士林1,马礼1,舒继武2   

  1. 1(北方工业大学计算机学院 北京 100144);2(清华大学计算机科学与技术系 北京 100084) (mooncape1986@126.com)
  • 出版日期: 2018-01-01
  • 基金资助: 
    国家自然科学基金项目(61232003,61702013);北京市优秀人才培养资助项目(2016000020124G016);北京市教委科技计划项目(KM201710009008);北方工业大学学术创新团队项目(XN018001);北方工业大学科研启动项目

Survey on Single Disk Failure Recovery Methods for Erasure Coded Storage Systems

Fu Yingxun1, Wen Shilin1, Ma Li1, Shu Jiwu2   

  1. 1(College of Computer Science, North China University of Technology, Beijing 100144);2(Department of Computer Science and Technology, Tsinghua University, Beijing 100084)
  • Online: 2018-01-01

摘要: 随着云存储的迅猛发展与大数据时代的来临,越来越多的存储系统开始采用纠删码技术,以保障数据的可靠性.在基于纠删码的存储系统中,一旦有磁盘出错,系统需根据其他磁盘里存储的冗余信息,重构所有失效数据.由于当前存储系统中绝大部分磁盘错误都是单磁盘错误,因此,如何快速地在单磁盘错误的情况下重构失效数据,已成为存储系统的研究热点.首先介绍了存储系统中基于纠删码的单磁盘错误重构优化方法的研究背景与研究意义,给出了纠删码的基本概念与定义,并分析了单磁盘错误重构优化的基本原理;接着归纳了现有的一些主流单磁盘错误重构方法的构造算法及其优缺点与适用范围,并分类介绍了一些用于优化单磁盘错误重构效率的新型纠删码技术;最后指出了存储系统中基于纠删码的磁盘错误重构方法的进一步研究方向.

关键词: 存储系统, 纠删码, 可靠性, 磁盘错误, 数据重构方法

Abstract: With the rapid development of cloud storage, erasure codes which can tolerate a series of disk failures with low storage overhead have attracted a lot of attentions. The implementations for erasure codes constructing over storage systems are erasure coded storage systems. Once disk failures happen, erasure coded storage systems need to access the information storing on the surviving disks, and then reconstruct the lost information by a certain recovery algorithm. With the development of storage scale, disk failures happen very frequently, where most of disk failures are single disk failure. Therefore, how to fast recover the lost data from single disk failures has becoming a key problem for erasure coded storage systems. In this paper, we first introduce the background and significance for single disk failure recoveries, and then give some fundamental terms and principles for erasure codes. Afterward, we illustrate the hybrid recovery principle, elaborate the key ideas for current construction-based recovery methods and search-based recovery methods in detail, and summarize their typical application scenarios. We also summarize some new erasure coding techniques for optimizing the single disk failure recovery efficiency. At the end of the paper, we discuss the research directions for disk failure recoveries under erasure coded storage systems in the future.

Key words: storage system, erasure code, reliability, disk failure, data recovery method

中图分类号: