Abstract:
In distributed storage systems, how to optimize the regeneration time of lost data so as to keep high reliability has attracted attention increasingly. Recent researches reveal that node selection mechanism during repair progress has great impact on regeneration time. SPSN (select provider select newcomer) algorithm has put forward by some studies, which suits the scenario of single node failure well. However, it is very common to repair many modes at the same time in actual system. In this scenario, SPSN algorithm will no longer be optimal taking large time and space consumption into consideration. By analyzing the data failure trace of real distributed file system, we propose a node selection algorithm B-WSJ (bandwidth based weak and strong judgement) based on the existing algorithms and repairing model with the characteristic of parallelism which is suitable for multi-failure scenario. In order to describe the algorithm better, we firstly define several concepts of node-relationship on a link. Secondly we use these concepts to realize the weak and strong judgment of target node with pre-process and pruning strategy added. Finally, the nodes with better bandwidth will be chosen. To evaluate the performance of NS algorithm, we use Waxman algorithm to generate network topology and do many experiments with node failure models in real system provided by FTA (failure trace archive). The experimental results show the performance of B-WSJ algorithm can be improved greatly compared with the existing algorithms.