• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Wei Zheng, Dou Yu, Gao Yanzhen, Ma Jie, Sun Ninghui, Xing Jing. A Consistent Hash Data Placement Algorithm Based on Stripe[J]. Journal of Computer Research and Development, 2021, 58(4): 888-903. DOI: 10.7544/issn1000-1239.2021.20190732
Citation: Wei Zheng, Dou Yu, Gao Yanzhen, Ma Jie, Sun Ninghui, Xing Jing. A Consistent Hash Data Placement Algorithm Based on Stripe[J]. Journal of Computer Research and Development, 2021, 58(4): 888-903. DOI: 10.7544/issn1000-1239.2021.20190732

A Consistent Hash Data Placement Algorithm Based on Stripe

Funds: This work was supported by the National Key Research and Development Program of China (2018YFC0809300), the National Natural Science Foundation of China (61502454), and the Distributed Full Flash Project of ECR Team of Lenovo Research Institute.
More Information
  • Published Date: March 31, 2021
  • As the carrier of data storage, distributed storage system is widely used in the field of large data. Erasure codes are widely adopted by storage systems because of their high spatial efficiency and reliable data storage. In EB-level large-scale erasure coded distributed storage system, the cost of metadata management is high, and the query efficiency of metadata such as location information affects the I/O latency and throughput. The centralized data placement algorithm, based on location information records, needs frequent access to metadata servers, resulting in performance optimization constraints. More and more centerless data placement algorithms based on Hash mapping are applied. But some problems exist in the process of node change and data recovery, such as difficult location change, a large amount of migrated data, low concurrency of data recovery and migration. This paper proposes a consistent Hash data placement algorithm based on stripe (SCHash). SCHash places data in the unit of stripe. By transforming the mapping from data block to node into the mapping process from stripe to node group, it reduces the amount of data migration in the process of node change. Thus, in the recovery process, the proportion of data migration is reduced, and the recovery speed is accelerated. On the basis of SCHash, this paper designs and implements a recovery strategy of parallel I/O scheduling based on stripe. The recovery strategy avoids the selection of the data block in the same node in I/O operation, which also enhances the degree of parallelism of I/O. Compared with the APHash algorithm, SCHash algorithm reduces the data transfer by 46.71% to 85.28% in the data recovery. The recovery rate is improved by 48.16% when the nodes are rebuilt in the stripe, and the recovery rate is increased by 138.44% when the nodes are rebuilt out of the stripe.
  • Related Articles

    [1]Zhang Jing, Ju Jialiang, Ren Yonggong. Double-Generators Network for Data-Free Knowledge Distillation[J]. Journal of Computer Research and Development, 2023, 60(7): 1615-1627. DOI: 10.7544/issn1000-1239.202220024
    [2]Xiang Chaocan, Cheng Wenhui, Zhang Zhao, Jiao Xianlong, Qu Yuben, Chen Chao, Dai Haipeng. Intelligent Edge Computing-Empowered Adaptive Urban Traffic Sensing Data Recovery[J]. Journal of Computer Research and Development, 2023, 60(3): 619-634. DOI: 10.7544/issn1000-1239.202110962
    [3]Pu Yonglin, Yu Jiong, Lu Liang, Li Ziyang, Guo Binglei, Liao Bin. Energy-Efficient Strategy Based on Data Recovery in Storm[J]. Journal of Computer Research and Development, 2021, 58(3): 479-496. DOI: 10.7544/issn1000-1239.2021.20200489
    [4]Xu Guangwei, Shi Chunhong, Feng Xiangyang, Luo Xin, Shi Xiujin, Han Songhua, Li Wei. Multi-Replica Cloud Data Storage Based on Hierarchical Network Coding[J]. Journal of Computer Research and Development, 2021, 58(2): 293-304. DOI: 10.7544/issn1000-1239.2021.20200340
    [5]Xiao Zhongzheng, Chen Ningjiang, Wei Jun, Zhang Wenbo. A High Performance Management Schema of Metadata Clustering for Large-Scale Data Storage Systems[J]. Journal of Computer Research and Development, 2015, 52(4): 929-942. DOI: 10.7544/issn1000-1239.2015.20131911
    [6]Wang Qiang, Li Xiongfei, Wang Jing. A Data Placement and Task Scheduling Algorithm in Cloud Computing[J]. Journal of Computer Research and Development, 2014, 51(11): 2416-2426. DOI: 10.7544/issn1000-1239.2014.20130749
    [7]Zhang Tiantian, Cui Lizhen, and Xu Meng. A Pareto-Based Data Placement Strategy in Database as a Service Model[J]. Journal of Computer Research and Development, 2014, 51(6): 1373-1382.
    [8]Zhang Peng, Wang Guiling, Xu Xuehui. A Data Placement Approach for Workflow in Cloud[J]. Journal of Computer Research and Development, 2013, 50(3): 636-647.
    [9]Wang Nianbin, Song Yibo, Yao Nianmin, Liu Daxin. A Parallel Data Processing Middleware Based on Clusters[J]. Journal of Computer Research and Development, 2007, 44(10): 1702-1708.
    [10]Sun Yongming, Lin Qi. 1.5Gbps High Speed Serial Data Recovery Circuit Made from Standard Cells[J]. Journal of Computer Research and Development, 2005, 42(10): 1826-1831.
  • Cited by

    Periodical cited type(3)

    1. 张婷,李文敬,黄帆. 基于多核PC的MAP记录表冲突规避算法. 计算机工程与设计. 2020(12): 3419-3424 .
    2. 张瑞聪,任鹏程,房凯,张卫山. Hadoop环境下分布式物联网设备状态分析处理系统. 计算机系统应用. 2019(12): 79-85 .
    3. 涂云山,储佳佳,张耀,翁楚良. 面向新硬件的数据处理软件技术. 华东师范大学学报(自然科学版). 2018(05): 30-40+78 .

    Other cited types(6)

Catalog

    Article views (643) PDF downloads (324) Cited by(9)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return