Abstract:
Distributed storage systems need to introduce redundancy to ensure data reliability against node failures. To repair failed nodes, a significant amount of bandwidth is consumed. Regenerating codes are able to achieve the optimal tradeoff between the storage overhead and the repair bandwidth overhead. Based on the current situation that bandwidth resources are more precious than computing resources in distributed storage systems, exact minimum bandwidth regenerating (E-MBR) codes, which can be implemented by a product-matrix construction, enjoy the advantages of regenerating codes as well as systematic codes, and have no restrictions for all construction parameters, making themselves a promising candidate towards the application in distributed storage systems. However, the performance overhead of distributed storage systems based on this coding scheme has not been investigated and analyzed. This paper gives a formal description of coding operations, which can be categorized into three distinct phrases: uploading, downloading and repairing. We hereby analyze the impact of the CPU utilization, the file size, the buffer size and the Galois field size to the computing rates in the three distinct phrases above. We find that distributed storage systems based on E-MBR codes are able to achieve a high computing throughput if we configure the construction parameters of E-MBR codes appropriately.