Abstract:
Data mapping is a critical problem in large scale storage systems. Besides high performance and scalability, excellent data mapping algorithm should also provide minimum amount of data migration for keeping balance under dynamic storage environment. Proposed in this paper is a decentralized mapping algorithm for replicated data in large scale storage system. By adopting storage node weighting and consistent hash mechanism, this algorithm could distribute mass storage objects among tens or hundreds of thousands storage devices according to their serving abilities. When the number of storage devices changes, the amount of data migration for data rebalance nearly reaches theoretical lower band. By maintaining only little amount of information, front applications could calculate the location of every data object without consulting the conventional centralized data object mapping table, which greatly improves system scalability. At the same time, the complexity of the data mapping process is quite low. Testing results show that the deviation of the allocated data objects quantity for each storage device from the theoretical value is less than 5%. Testing results also show that the deviation of migrated data objects quantity needed for data rebalance from the theoretical lower band is less than 1%.