Abstract:
Nowadays storage systems become larger and larger, so the number of storage devices is increasing rapidly, which makes storage device failure occur quite frequently in large scale storage systems. Data replica technology begins to be adopted prevalently to enhance storage system reliability. When designing a large scale storage system, there are many factors that could affect the reliability of the storage system, such as failure detection latency, storage node capacity selection, data object size design, replica rank selection and so on. On the other hand, system reliability can not be exactly experimented, so a theoretical model is needed to evaluate it. In this paper, an analytical framework is represented to evaluate the reliability for large scale storage systems which adopt replica technology to protect data. Based on the Markov model, this analytical model could provide quantitative answers to measure the impact of a series of storage system design factors on the reliability of storage systems, such as the rank of the replicated data, the capacity of the storage system, the capacity of storage nodes, the size of data object, the repair bandwidth, mean time failure detection latency and so on. Hence, many storage system design tradeoffs could be reasoned by this framework.