• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhang Kaixin, Wang Yijie, Bao Han, Kan Junhui. An Adaptive Erasure-Coded Data Access Method for Cross-Cloud Collaborative Scheduling of Storage and Computation[J]. Journal of Computer Research and Development, 2024, 61(3): 571-588. DOI: 10.7544/issn1000-1239.202330541
Citation: Zhang Kaixin, Wang Yijie, Bao Han, Kan Junhui. An Adaptive Erasure-Coded Data Access Method for Cross-Cloud Collaborative Scheduling of Storage and Computation[J]. Journal of Computer Research and Development, 2024, 61(3): 571-588. DOI: 10.7544/issn1000-1239.202330541

An Adaptive Erasure-Coded Data Access Method for Cross-Cloud Collaborative Scheduling of Storage and Computation

Funds: This work was supported by the National Key Research and Development Program of China (2022ZD0115302), the National Natural Science Foundation of China (61379052), the Science Foundation of Ministry of Education of China (2018A02002), and the Natural Science Foundation for Distinguished Young Scholars of Hunan Province (14JJ1026).
More Information
  • Author Bio:

    Zhang Kaixin: born in 2002. Master candidate. His main research interests include cloud storage and erasure coding

    Wang Yijie: born in 1971. PhD. professor, PhD supervisor. Distinguished member of CCF. Her main research interests include distributed storage, big data analysis, and cloud computing

    Bao Han: born in 1992. PhD. His main research interests include cloud storage and erasure coding

    Kan Junhui: born in 2000. Master candidate. His main research interests include cloud storage and collaborative scheduling of storage and computation

  • Received Date: June 20, 2023
  • Revised Date: December 03, 2023
  • Available Online: December 19, 2023
  • Nowadays, the increasing demand for cross-cloud collaborative scheduling of storage and computation puts high demands on cross-cloud data access speed. Therefore, cross-cloud data access methods based on data redundancy techniques (erasure coding and multiple-duplicate) with high cross-cloud data access speed are gaining attention. Among them, the cross-cloud data access method based on erasure coding has become a hot research topic because of its low storage overhead and high fault tolerance. In order to improve the data access speed by shortening the transmission time of coded blocks, existing cross-cloud data access methods based on erasure coding introduce caching techniques and optimize the coded data access scheme. However, due to the coarse granularity of cache management and the lack of coordinated optimization of cache management and coded data access scheme, the existing methods suffer from low cache hits, low cache hit efficiency, and high access volume of coded blocks with low transmission speed, which prolong the coded block transmission time. To this end, we first propose an IPFS-based cross-cloud storage system framework (IBCS) that can realize fine-grained cache management based on IPFS data slice management mechanism, and thus can improve cache hits. Then, we propose an adaptive erasure-coded data access method for cross-cloud collaborative scheduling of storage and computation (AECAM) that evaluates the transmission speed of each coded block during data access based on the distribution of coded blocks (including cached coded blocks) and data access nodes, and accordingly formulates a coded data access scheme that can avoid accessing low transmission speed coded blocks. In addition, AECAM identifies coded blocks that are easily selected in the coded data access scheme and have low transmission speed, and caches them near the data access nodes, thus improving both cache hits and hit efficiency. We build a cross-cloud storage system for collaborative scheduling of storage and computation (C2S2) based on IBCS and AECAM. Compared with existing erasure-coded storage systems that introduce caching, experiments in a cross-cloud environment show that C2S2 can improve data access speed by 75.22%−81.29%.

  • [1]
    Aggarwal C, Charu C. Outlier Analysis[M]. Berlin: Springer, 2017
    [2]
    Schmuck F, Haskin R. GPFS: A shared-disk file system for large computing clusters[C/OL] // Proc of the 1st USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2002 [2023-06-18].https://cse.buffalo.edu/faculty/tkosar/cse710_spring14/papers/gpfs.pdf
    [3]
    Ghemawat S, Gobioff H B, Leung S. The Google file system[J]. ACM SIGOPS Operating Systems Review, 2003, 37(5): 29−43 doi: 10.1145/1165389.945450
    [4]
    Halbwachs N, Caspi P, Raymond P, et al. The synchronous data flow programming language Lustre[J]. Proceedings of the IEEE, 1991, 79(9): 1305−1320 doi: 10.1109/5.97300
    [5]
    Weil S A, Brandt S A, Miller E L, et al. Ceph: A scalable, high-performance distributed file system[C] // Proc of the 7th Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2006: 307-320
    [6]
    王意洁,许方亮,裴晓强. 分布式存储中的纠删码容错技术研究[J]. 计算机学报,2017,40(1):236−255

    Wang Yijie, Xu Fangliang, Pei Xiaoqiang. Research on erasure code-based fault-tolerant technology for distributed storage[J]. Chinese Journal of Computers, 2017, 40(1): 236−255(in Chinese)
    [7]
    Wang Yijie, Pei Xiaoqiang, Ma Xingkong, et al. TA-Update: An adaptive update scheme with tree-structured transmission in erasure-coded storage systems[J]. IEEE Transactions on Parallel & Distributed Systems, 2017, 29(8): 1893−1906
    [8]
    包涵,王意洁,许方亮. 基于生成矩阵变换的跨数据中心纠删码写入方法[J]. 计算机研究与发展,2020,57(2):291−305 doi: 10.7544/issn1000-1239.2020.20190542

    Bao Han, Wang Yijie, Xu Fangliang. A cross- datacenter erasure code writing method based on generator matrix[J]. Journal of Computer Research and Development, 2020, 57(2): 291−305(in Chinese) doi: 10.7544/issn1000-1239.2020.20190542
    [9]
    Shen Zhirong, Shu Jiwu, Huang Zhijie, et al. ClusterSR: Cluster-aware scattered repair in erasure-coded storage[C]// Proc of the 34th IEEE Int Parallel and Distributed Processing Symp (IPDPS). Piscataway, NJ: IEEE, 2020: 42−51
    [10]
    Fu Yingxun, Liu Xun, Shu Jiwu, et al. Device and placement aware framework to optimize single failure recoveries and reads for erasure coded storage system with heterogeneous storage devices[C]// Proc of the 39th Int Symp on Reliable Distributed Systems (SRDS). Piscataway, NJ: IEEE, 2020: 225−235
    [11]
    Lakshmi J, Mohan K, Rajawat U, et al. Optimal placement for repair-efficient erasure codes in geo-diverse storage centres[J]. Journal of Parallel and Distributed Computing, 2020, 135(C): 101−113
    [12]
    Saeed S. Sandooq: Improving the communication cost and service latency for a multi-user erasure-coded geo-distributed cloud environment[D]. Urbana, Illinois: University of Illinois at Urbana-Champaign, 2016
    [13]
    Zhang Xingjun, Cai Yi, Liu Yunfei, et al. NADE: Nodes performance awareness and accurate distance evaluation for degraded read in heterogeneous distributed erasure code-based storage[J]. The Journal of Supercomputing, 2019, 76(6): 1−30
    [14]
    Rashmi K V, Chowdhury M, Kosaian J, et al. EC-Cache: Load-balanced, low-latency cluster caching with online erasure coding[C]// Proc of the 12th USENIX Conf on Operating Systems Design and Implementation. Berkeley, CA: USNEX Association, 2016: 401−417
    [15]
    Zhang Mi, Wang Qiuping, Shen Zhirong, et al. POCache: Toward robust and configurable straggler tolerance with parity-only caching[J]. Journal of Parallel and Distributed Computing, 2022, 167(C): 157−172
    [16]
    Halalai R, Felber P, Kermarrec A M, et al. Agar: A caching system for erasure-coded data[C]// Proc of the 37th IEEE Int Conf on Distributed Computing Systems. Piscataway, NJ: IEEE, 2017: 23−33
    [17]
    Abebe M, Daudjee K, Glasbergen B, et al. EC-Store: Bridging the gap between storage and latency in distributed erasure coded systems[C]//Proc of the 38th IEEE Int Conf on Distributed Computing Systems. Piscataway NJ: IEEE, 2018: 255−266
    [18]
    Zhou Jiang, Xie Wei, Dai Dong, et al. PRS: A pattern-directed replication scheme for heterogeneous object-based storage[J]. IEEE Transactions on Computers, 2020, 69(4): 591−605 doi: 10.1109/TC.2019.2954089
    [19]
    Al-Abbasi A, Aggarwal V. TTLCache: Taming latency in erasure-coded storage through TTL caching[J]. IEEE Transactions on Network and Service Management, 2020, 17(3): 1582−1596
    [20]
    Aggarwal V, Chen Y, Lan Tian, et al. Sprout: A functional caching approach to minimize service latency in erasure-coded storage[C]//Proc of the 36th IEEE Int Conf on Distributed Computing Systems. Piscataway NJ: IEEE, 2016: 753−754
    [21]
    Shankar K C P, Shyry S P. A novel framework for securing ECDH encrypted DICOM pixel data stored over cloud using IPFS[J]. International Journal of Uncertainty, Fuzziness and Knowledge-Based Systems, 2023, 31(Supp01): 135−164
    [22]
    Li Wenjuan, Wang Yu, Li Jin. Enhancing blockchain-based filtration mechanism via IPFS for collaborative intrusion detection in IoT networks[J]. Journal of Systems Architecture, 2022, 127(C): 102510
    [23]
    Doan T V, Psaras Y, Ott J, et al. Towards decentralised cloud storage with IPFS: Opportunities, challenges, and future directions[J]. IEEE Internet Computing, 2022, 26(6): 7−15 doi: 10.1109/MIC.2022.3209804
    [24]
    刘姝言. 基于Fabric和IPFS的域名解析技术研究[J]. 智能计算机与应用,2020,10(2):365−369

    Liu Shuyan. Research on domain name resolution technology based on IPFS[J]. Intelligent Computer and Applications, 2020, 10(2): 365−369(in Chinese)
    [25]
    Szabo G, Huberman B A. Predicting the popularity of online content[J]. Communications of the ACM, 2010, 53(8): 80−88 doi: 10.1145/1787234.1787254
  • Related Articles

    [1]Li Nan, Ding Yidong, Jiang Haoyu, Niu Jiafei, Yi Ping. Jailbreak Attack for Large Language Models: A Survey[J]. Journal of Computer Research and Development, 2024, 61(5): 1156-1181. DOI: 10.7544/issn1000-1239.202330962
    [2]Chen Xuanting, Ye Junjie, Zu Can, Xu Nuo, Gui Tao, Zhang Qi. Robustness of GPT Large Language Models on Natural Language Processing Tasks[J]. Journal of Computer Research and Development, 2024, 61(5): 1128-1142. DOI: 10.7544/issn1000-1239.202330801
    [3]Shu Wentao, Li Ruixiao, Sun Tianxiang, Huang Xuanjing, Qiu Xipeng. Large Language Models: Principles, Implementation, and Progress[J]. Journal of Computer Research and Development, 2024, 61(2): 351-361. DOI: 10.7544/issn1000-1239.202330303
    [4]Yang Yi, Li Ying, Chen Kai. Vulnerability Detection Methods Based on Natural Language Processing[J]. Journal of Computer Research and Development, 2022, 59(12): 2649-2666. DOI: 10.7544/issn1000-1239.20210627
    [5]Pan Xuan, Xu Sihan, Cai Xiangrui, Wen Yanlong, Yuan Xiaojie. Survey on Deep Learning Based Natural Language Interface to Database[J]. Journal of Computer Research and Development, 2021, 58(9): 1925-1950. DOI: 10.7544/issn1000-1239.2021.20200209
    [6]Zheng Haibin, Chen Jinyin, Zhang Yan, Zhang Xuhong, Ge Chunpeng, Liu Zhe, Ouyang Yike, Ji Shouling. Survey of Adversarial Attack, Defense and Robustness Analysis for Natural Language Processing[J]. Journal of Computer Research and Development, 2021, 58(8): 1727-1750. DOI: 10.7544/issn1000-1239.2021.20210304
    [7]Pan Xudong, Zhang Mi, Yan Yifan, Lu Yifan, Yang Min. Evaluating Privacy Risks of Deep Learning Based General-Purpose Language Models[J]. Journal of Computer Research and Development, 2021, 58(5): 1092-1105. DOI: 10.7544/issn1000-1239.2021.20200908
    [8]Bao Yang, Yang Zhibin, Yang Yongqiang, Xie Jian, Zhou Yong, Yue Tao, Huang Zhiqiu, Guo Peng. An Automated Approach to Generate SysML Models from Restricted Natural Language Requirements in Chinese[J]. Journal of Computer Research and Development, 2021, 58(4): 706-730. DOI: 10.7544/issn1000-1239.2021.20200757
    [9]Yu Kai, Jia Lei, Chen Yuqiang, and Xu Wei. Deep Learning: Yesterday, Today, and Tomorrow[J]. Journal of Computer Research and Development, 2013, 50(9): 1799-1804.
    [10]Che Haiyan, Feng Tie, Zhang Jiachen, Chen Wei, and Li Dali. Automatic Knowledge Extraction from Chinese Natural Language Documents[J]. Journal of Computer Research and Development, 2013, 50(4): 834-842.

Catalog

    Article views (157) PDF downloads (95) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return