高级检索

    集群RAID5存储系统可靠性分析

    Reliability Analysis of Cluster RAID5 Storage System

    • 摘要: 集群存储具有高并发性、高可扩展性、高性价比等特点,成为构建大型存储系统的一种重要技术.但随着系统规模的扩大,系统失效事件日益频繁,如何提高系统可靠性,保证数据持续、有效地访问就成为一个必须解决的问题.集群RAID系统是传统RAID技术在集群存储上的延伸,是有效解决可靠性问题的一种方案.出于设计集群RAID5系统的需要,提出了一种基于Markov模型的集群RAID5存储系统的可靠性模型,用以定量分析存储拓扑结构、节点/磁盘平均失效时间、重构速率等参数对集群RAID5系统可靠性的影响.分析表明:1.多层集群RAID5具有比单层集群RAID5更高的系统可靠性,更适合构建集群RAID系统;2.提高重构速率能够带来近似等幅的可靠性提升;3.在保持可靠性不变的前提下,提高重构速率能够降低系统对节点平均失效时间的需求,10倍的重构速率提升最多可以使得系统对节点平均失效时间的需求降低到原来的1/7.

       

      Abstract: As it possesses characteristics such as high concurrency, high scalability, high cost-effectiveness and so on, cluster storage has become an important method to build large data centers. As the system grows, its failure increases dramatically, and how to improve the system reliability to ensure continuous data access has become a key issue to cluster storage. Cluster RAID system is the extension of traditional RAID technology in cluster storage environment, and provides an effective solution to the problem of system reliability. An analytic reliability model of cluster RAID5 storage system based on the Markov model is proposed, for the purpose of quantitative analysis of the impact of a variety of parameters such as the storage topology, the nodes failure rate, the disk failure rate, rebuild speed, etc., on the cluster RAID system reliability. Analysis shows that: Firstly, multiple tiers cluster RAID5 takes advantage over single tier cluster RAID5, and is better suited to building a cluster RAID system; Secondly, increasing the rebuild speed could bring almost the same amplitude improvement of cluster RAID5 system reliability; Thirdly, when maintaining the same system reliability, enhancing the rebuild speed could decrease the need of large node MTTF(mean time to failure),and 10 times of rebuild speed improvement at most could reduce to one-seventh of the original node MTTF.

       

    /

    返回文章
    返回