ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (7): 1557-1568.doi: 10.7544/issn1000-1239.2018.20160915

Previous Articles     Next Articles

Node Selection Algorithm During Multi-Nodes Repair Progress in Distributed Storage System

Liu Pei1,2, Jiang Ziyi1, Cao Xiu1,2   

  1. 1(School of Computer Science and Technology, Fudan University, Shanghai 201203); 2(Engineering Research Center of Cyber Security Auditing and Monitoring (Fudan University), Ministry of Education, Shanghai 200433)
  • Online:2018-07-01

Abstract: In distributed storage systems, how to optimize the regeneration time of lost data so as to keep high reliability has attracted attention increasingly. Recent researches reveal that node selection mechanism during repair progress has great impact on regeneration time. SPSN (select provider select newcomer) algorithm has put forward by some studies, which suits the scenario of single node failure well. However, it is very common to repair many modes at the same time in actual system. In this scenario, SPSN algorithm will no longer be optimal taking large time and space consumption into consideration. By analyzing the data failure trace of real distributed file system, we propose a node selection algorithm B-WSJ (bandwidth based weak and strong judgement) based on the existing algorithms and repairing model with the characteristic of parallelism which is suitable for multi-failure scenario. In order to describe the algorithm better, we firstly define several concepts of node-relationship on a link. Secondly we use these concepts to realize the weak and strong judgment of target node with pre-process and pruning strategy added. Finally, the nodes with better bandwidth will be chosen. To evaluate the performance of NS algorithm, we use Waxman algorithm to generate network topology and do many experiments with node failure models in real system provided by FTA (failure trace archive). The experimental results show the performance of B-WSJ algorithm can be improved greatly compared with the existing algorithms.

Key words: distributed storage system, data repair, regeneration time, multi-node failure, node selection

CLC Number: