高级检索

    一种面向大规模存储系统的数据副本映射算法

    A Mapping Algorithm for Replicated Data in LargeScale Storage System

    • 摘要: 提出一种适应动态环境、无需数据映射表的多副本数据对象映射算法.该算法引入节点权重,借鉴一致性Hash技术,使得海量的数据对象按照节点服务能力在各存储节点间均匀分布;当存储节点个数发生变化时,数据依然在节点间均匀分布,且数据迁移量接近理论下限;只需维护少量数据即可计算得到数据布局,从而有效提高了系统的可扩展性.测试结果表明,算法可使所有节点分配对象个数与理论值偏差小于5%,节点个数变化时移动数据数量与理论下限偏差小于1%.

       

      Abstract: Data mapping is a critical problem in large scale storage systems. Besides high performance and scalability, excellent data mapping algorithm should also provide minimum amount of data migration for keeping balance under dynamic storage environment. Proposed in this paper is a decentralized mapping algorithm for replicated data in large scale storage system. By adopting storage node weighting and consistent hash mechanism, this algorithm could distribute mass storage objects among tens or hundreds of thousands storage devices according to their serving abilities. When the number of storage devices changes, the amount of data migration for data rebalance nearly reaches theoretical lower band. By maintaining only little amount of information, front applications could calculate the location of every data object without consulting the conventional centralized data object mapping table, which greatly improves system scalability. At the same time, the complexity of the data mapping process is quite low. Testing results show that the deviation of the allocated data objects quantity for each storage device from the theoretical value is less than 5%. Testing results also show that the deviation of migrated data objects quantity needed for data rebalance from the theoretical lower band is less than 1%.

       

    /

    返回文章
    返回