Abstract:
Nowadays, the increasing demand for cross-cloud collaborative scheduling of storage and computation puts high demands on cross-cloud data access speed. Therefore, cross-cloud data access methods based on data redundancy techniques (erasure coding and multiple-duplicate) with high cross-cloud data access speed are gaining attention. Among them, the cross-cloud data access method based on erasure coding has become a hot research topic because of its low storage overhead and high fault tolerance. In order to improve the data access speed by shortening the transmission time of coded blocks, existing cross-cloud data access methods based on erasure coding introduce caching techniques and optimize the coded data access scheme. However, due to the coarse granularity of cache management and the lack of coordinated optimization of cache management and coded data access scheme, the existing methods suffer from low cache hits, low cache hit efficiency, and high access volume of coded blocks with low transmission speed, which prolong the coded block transmission time. To this end, we first propose an IPFS-based cross-cloud storage system framework (IBCS) that can realize fine-grained cache management based on IPFS data slice management mechanism, and thus can improve cache hits. Then, we propose an adaptive erasure-coded data access method for cross-cloud collaborative scheduling of storage and computation (AECAM) that evaluates the transmission speed of each coded block during data access based on the distribution of coded blocks (including cached coded blocks) and data access nodes, and accordingly formulates a coded data access scheme that can avoid accessing low transmission speed coded blocks. In addition, AECAM identifies coded blocks that are easily selected in the coded data access scheme and have low transmission speed, and caches them near the data access nodes, thus improving both cache hits and hit efficiency. We build a cross-cloud storage system for collaborative scheduling of storage and computation (C2S2) based on IBCS and AECAM. Compared with existing erasure-coded storage systems that introduce caching, experiments in a cross-cloud environment show that C2S2 can improve data access speed by 75.22%−81.29%.