高级检索
    张凯鑫, 王意洁, 包涵, 阚浚晖. 面向存算联动的跨云纠删码自适应数据访问方法[J]. 计算机研究与发展.
    引用本文: 张凯鑫, 王意洁, 包涵, 阚浚晖. 面向存算联动的跨云纠删码自适应数据访问方法[J]. 计算机研究与发展.
    Zhang Kaixin, Wang Yijie, Bao Han, Kan Junhui. An Adaptive Erasure-Coded Data Access Method for Cross-Cloud Storage-Computation Collaborative Scheduling[J]. Journal of Computer Research and Development.
    Citation: Zhang Kaixin, Wang Yijie, Bao Han, Kan Junhui. An Adaptive Erasure-Coded Data Access Method for Cross-Cloud Storage-Computation Collaborative Scheduling[J]. Journal of Computer Research and Development.

    面向存算联动的跨云纠删码自适应数据访问方法

    An Adaptive Erasure-Coded Data Access Method for Cross-Cloud Storage-Computation Collaborative Scheduling

    • 摘要: 日益旺盛的跨云存算联动需求对跨云数据访问速度提出较高要求. 因此,跨云数据访问速度较高的基于数据冗余技术(纠删码和多副本)的跨云数据访问方法逐渐受到关注. 其中,基于纠删码的跨云数据访问方法因其存储开销较低、容错性较高而成为当前研究热点. 为通过缩短编码块传输用时以提高数据访问速度,现有基于纠删码的跨云数据访问方法尝试引入缓存技术并优化编码数据访问方案. 然而,由于现有方法的缓存管理粒度较粗且未协同优化缓存管理与编码数据访问方案,导致其存在缓存命中量低、缓存命中增效低、低传输速度编码块访问量大等问题,使得其编码块传输用时仍较长. 为此,首先提出了一种基于星际文件系统(interplanetary file system,IPFS)的跨云存储系统框架(IPFS-based cross-cloud storage system framework,IBCS),可基于IPFS数据分片管理机制实现细粒度的缓存管理,从而可提高缓存命中量. 然后,提出一种面向存算联动的跨云纠删码自适应数据访问方法(adaptive erasure-coded data access method for cross-cloud storage-computation collaborative scheduling,AECAM). AECAM以编码块(含缓存编码块)与数据访问节点的分布为依据评估数据访问过程中各编码块的传输速度,并据此制定可避免访问低传输速度编码块的编码数据访问方案. 此外,AECAM可识别出其制定编码数据访问方案时易选中且实际传输速度较低的编码块,并将其缓存在数据访问节点附近,从而可同时提高缓存命中量和命中增效. 最后,基于IBCS和AECAM构建了面向跨云存算联动的存储系统(cross-cloud storage system for storage-computation collaborative scheduling,C2S2). 跨云环境下的实验表明,相较于现有引入缓存的基于纠删码的存储系统,C2S2可以将数据访问速度提高75.22%~81.29%.

       

      Abstract: Nowadays, the increasing demand for cross-cloud storage-computation collaborative scheduling puts high demands on cross-cloud data access speed. Therefore, cross-cloud data access methods based on data redundancy techniques (erasure coding and multiple-duplicate) with high cross-cloud data access speed are gaining attention. Among them, the cross-cloud data access method based on erasure coding has become a hot research topic because of its low storage overhead and high fault tolerance. In order to improve the data access speed by shortening the transmission time of coded blocks, existing cross-cloud data access methods based on erasure coding introduce caching techniques and optimize the coded data access scheme. However, due to the coarse granularity of cache management and the lack of coordinated optimization of cache management and coded data access scheme, the existing methods suffer from low cache hits, low cache hit efficiency, and high access volume of coded blocks with low transmission speed, which prolong the coded block transmission time. To this end, we first propose an IPFS-based cross-cloud storage system framework (IBCS) that can realize fine-grained cache management based on IPFS data slice management mechanism, and thus can improve cache hits. Then, we propose an adaptive erasure-coded data access method for cross-cloud storage-computation collaborative scheduling (AECAM) that evaluates the transmission speed of each coded block during data access based on the distribution of coded blocks (including cached coded blocks) and data access nodes, and accordingly formulates a coded data access scheme that can avoid accessing low transmission speed coded blocks. In addition, AECAM identifies coded blocks that are easily selected in the coded data access scheme and have low transmission speed, and caches them near the data access nodes, thus improving both cache hits and hit efficiency. We build a storage system for cross-cloud storage-computation collaborative scheduling (C2S2) based on IBCS and AECAM. Compared to existing erasure-coded storage systems that introduce caching, experiments in a cross-cloud environment show that C2S2 can improve data access speed by 75.22%—81.29%.

       

    /

    返回文章
    返回