• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Wang Qing, Li Junru, Shu Jiwu. Survey on In-Network Storage Systems[J]. Journal of Computer Research and Development, 2023, 60(11): 2681-2695. DOI: 10.7544/issn1000-1239.202220865
Citation: Wang Qing, Li Junru, Shu Jiwu. Survey on In-Network Storage Systems[J]. Journal of Computer Research and Development, 2023, 60(11): 2681-2695. DOI: 10.7544/issn1000-1239.202220865

Survey on In-Network Storage Systems

Funds: This work was supported by the National Natural Science Foundation of China (61832011).
More Information
  • Author Bio:

    Wang Qing: born in 1997. PhD. His main research interests include storage systems and memory systems

    Li Junru: born in 1997. PhD candidate. His main research interests include programmable network and distributed storage systems

    Shu Jiwu: born in 1968. PhD, professor, PhD supervisor. His main research interests include non-volatile memory systems and technologies, storage security and reliability, and parallel and distributed computing

  • Received Date: October 08, 2022
  • Revised Date: March 05, 2023
  • Available Online: August 01, 2023
  • Programmable network devices, represented by programmable switches and SmartNICs, are increasingly used in modern data centers to support the execution of customized data processing logic on network data transmission paths, which brings new opportunities for building high-performance in-network storage systems. However, programmable network devices have hardware resource limitations (e.g., limited expressive powers and small memory space), and there are still many challenges to fully utilize their advantages and maximize the acceleration of storage systems. We systematically review the recent research progress of in-network storage systems. First, we describe the hardware architecture and performance characteristics of programmable network devices, and based on this, we summarize two major challenges in building high-performance in-network storage systems: 1) division of labor between hardware and software, 2) fault tolerance of the storage systems. Then, according to the tasks performed by programmable network devices (data caching, distributed coordination, request scheduling, data aggregation), we classify and describe existing in-network storage systems. Moreover, using several examples of in-network storage systems, we analyze corresponding design difficulties and software technologies. Finally, we indicate open problems that need to be explored in further research on in-network storage systems, including switch-NIC collaboration, data security, multi-tenancy, and automatic function offloading.

  • [1]
    Seagate. The digitization of the world: From edge to core [EB/OL]. [2022-09-20].https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
    [2]
    Nvidia. ConnectX-6 [EB/OL]. [2022-09-20].https://www.nvidia.com/en-us/networking/ethernet/connectx-6/
    [3]
    Bosshart P, Gibb G, Kim H S, et al. Forwarding metamorphosis: Fast programmable match-action processing in hardware for SDN[J]. ACM SIGCOMM Computer Communication Review, 2013, 43(4): 99−110 doi: 10.1145/2534169.2486011
    [4]
    Intel. Intel Tofino intelligent fabric processors [EB/OL]. [2022-09-20].https://www.intel.com/content/www/us/en/products/network-io/programmable-ethernet-switch/tofino-3-product-brochure.html
    [5]
    NVIDIA. NVIDIA BlueField data processing units [EB/OL]. [2022-09-20].https://www.nvidia.com/en-us/networking/products/data-processing-unit/
    [6]
    Netronome. Agilio CX SmartNICs [EB/OL]. [2022-09-20].https://www.netronome.com/products/agilio-cx/
    [7]
    Nvidia. ConnectX SmartNICs [EB/OL]. [2022-09-20].https://www.nvidia.com/en-us/networking/ethernet/innova-2-flex/
    [8]
    Nvidia. Innova-2 Flex [EB/OL]. [2022-09-20].https://www.nvidia.com/en-au/networking/ethernet-adapters/
    [9]
    马潇潇,杨帆,王展,等. 智能网卡综述[J]. 计算机研究与发展,2022,59(1):1−21

    Ma Xiaoxiao, Yang Fan, Wang Zhan, et al. Survey on smart network interface card[J]. Journal of Computer Research and Development, 2022, 59(1): 1−21 (in Chinese)
    [10]
    Wang Qing, Lu Youyou, Xu Erci, et al. Concordia: Distributed shared memory with in-network cache coherence[C]//Proc of the 19th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2021: 277−292
    [11]
    Chole S, Fingerhut A, Ma Sha, et al. dRMT: Disaggregated programmable switching[C/OL]//Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2017 [2023-02-09].https://doi.org/10.1145/3098822.3098823
    [12]
    Shrivastav V. Stateful multi-pipelined programmable switches[C]//Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2022: 663−676
    [13]
    Kim D, Liu Zaoxing, Zhu Yibo, et al. TEA: Enabling state-intensive network functions on programmable switches[C]//Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2020: 90−106
    [14]
    Yuan Yifan, Alama O, Fei Jiawei, et al. Unlocking the power of inline floating-point operations on programmable switches[C]//Proc of the 19th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2022: 683−700
    [15]
    Sivaraman A, Cheung A, Budiu M, et al. Packet transactions: High-level programming for line-rate switches[C]//Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2016: 15−28
    [16]
    Jin Xin, Li Xiaozhou, Zhang Haoyu, et al. NetCache: Balancing key-value stores with fast in-network caching[C]//Proc of the 26th Symp on Operating Systems Principles. New York: ACM, 2017: 121−136
    [17]
    Kogias M, Prekas G, Ghosn A, et al. R2P2: Making RPCs first-class datacenter citizens[C]//Proc of the 44th USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2019: 863−880
    [18]
    Seemakhupt K, Liu Sihang, Senevirathne Y, et al. PMNet: In-network data persistence[C]//Proc of the 48th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2021: 804−817
    [19]
    Kim D, Nelson J, Ports D R K, et al. RedPlane: Enabling fault-tolerant stateful in-switch applications[C]//Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2021: 223−244
    [20]
    Li Bojie, Ruan Zhenyuan, Xiao Wencong, et al. KV-Direct: High-performance in-memory key-value store with programmable NIC[C]//Proc of the 26th Symp on Operating Systems Principles. New York: ACM, 2017: 137−152
    [21]
    Li Junru, Lu Youyou, Zhang Yiming, et al. SwitchTx: Scalable in-network coordination for distributed transaction processing[C]//Proc of the 48th Int Conf on Very Large Databases. New York: ACM, 2022: 2881−2894
    [22]
    Li Junru, Lu Youyou, Wang Qing, et al. AlNiCo: SmartNIC-accelerated contention-aware request scheduling for transaction processing[C]//Proc of the 47th USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2022: 951−966
    [23]
    Qiao Yi, Kong Xiao, Zhang Menghao, et al. Towards in-network acceleration of erasure coding[C]//Proc of the Symp on SDN Research. New York: ACM, 2020: 41−47
    [24]
    Fan Bin, Lim H, Andersen D G, et al. Small cache, big effect: Provable load balancing for randomly partitioned cluster services[C]//Proc of the 2nd ACM Symp on Cloud Computing. New York: ACM, 2011: 264−275
    [25]
    Cormode G, Muthukrishnan S. An improved data stream summary: The count-min sketch and its applications[J]. Journal of Algorithms, 2005, 55(1): 58−75 doi: 10.1016/j.jalgor.2003.12.001
    [26]
    Luo Lailong, Guo Deke, Ma R T B, et al. Optimizing Bloom filter: Challenges, solutions, and comparisons[J]. IEEE Communications Surveys & Tutorials, 2018, 21(2): 1912−1949
    [27]
    Liu Zaoxing, Bai Zhihao, Liu Zhenming, et al. DistCache: Provable load balancing for large-scale storage systems with distributed caching[C]//Proc of the 17th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2019: 143−157
    [28]
    Jin Xin, Li Xiaozhou, Zhang Haoyu, et al. NetChain: Scale-free sub-RTT coordination[C]//Proc of the 15th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2018: 35−49
    [29]
    Van Renesse R, Schneider F B. Chain replication for supporting high throughput and availability[C]// Proc of the 6th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2004: 91−104
    [30]
    Sun Shangyi, Zhang Rui, Yan Ming, et al. SKV: A SmartNIC-offloaded distributed key-value store[C]//Proc of IEEE Int Conf on Cluster Computing. Piscataway, NJ: IEEE, 2022: 132−142
    [31]
    Li Jialin, Nelson J, Michael E, et al. Pegasus: Tolerating skewed workloads in distributed storage with in-network coherence directories[C]//Proc of the 14th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2020: 387−406
    [32]
    Lee S, Yu Yanpeng, Tang Yupeng, et al. Mind: In-network memory management for disaggregated data centers[C]//Proc of the 28th ACM SIGOPS Symp on Operating Systems Principles. New York: ACM, 2021: 488−504
    [33]
    Yu Zhuolong, Zhang Yiwen, Braverman V, et al. NetLock: Fast, centralized lock management using programmable switches[C]//Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2020: 126−138
    [34]
    Li Jialin, Michael E, Ports D R K. Eris: Coordination-free consistent transactions using in-network concurrency control[C]//Proc of the 26th Symp on Operating Systems Principles. New York: ACM, 2017: 104−120
    [35]
    Schuh H N, Liang Weihao, Liu Ming, et al. Xenic: SmartNIC-Accelerated Distributed Transactions[C]//Proc of the 28th ACM SIGOPS Symp on Operating Systems Principles. New York: ACM, 2021: 740−755
    [36]
    Cowling J, Liskov B. Granola: Low-overhead distributed transaction coordination[C]//Proc of the 37th USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2012: 223−235
    [37]
    Kung H T, Robinson J T. On optimistic methods for concurrency control[J]. ACM Transactions on Database Systems, 1981, 6(2): 213−226 doi: 10.1145/319566.319567
    [38]
    Celis P, Larson P A, Munro J I. Robin hood hashing[C]//Proc of the 26th Annual Symp on Foundations of Computer Science. Piscataway, NJ: IEEE, 1985: 281−288
    [39]
    Kim J, Jang I, Reda W, et al. LineFS: Efficient SmartNIC offload of a distributed file system with pipeline parallelism[C]//Proc of the 28th ACM SIGOPS Symp on Operating Systems Principles. New York: ACM, 2021: 756−771
    [40]
    Zhu Hang, Kaffes K, Chen Zixu, et al. RackSched: A microsecond-scale scheduler for rack-scale computers[C]//Proc of the 14th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2020: 1225−1240
    [41]
    Kaffes K, Chong T, Humphries J T, et al. Shinjuku: Preemptive scheduling for μ second-scale tail latency[C]//Proc of the 16th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2019: 345−360
    [42]
    Zhu Hang, Bai Zhihao, Li Jialin, et al. Harmonia: Near-linear scalability for replicated storage with in-network conflict detection[C]//Proc of the 45th Int Conf on Very Large Databases. New York: ACM, 2019: 375−388
    [43]
    Takruri H, Kettaneh I, Alquraan A, et al. FLAIR: Accelerating reads with consistency-aware network routing[C]//Proc of the 17th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2020: 723−737
    [44]
    Plank J S. A tutorial on Reed-Solomon coding for fault-tolerance in RAID-like systems[J]. Software: Practice and Experience, 1997, 27(9): 995−1012 doi: 10.1002/(SICI)1097-024X(199709)27:9<995::AID-SPE111>3.0.CO;2-6
    [45]
    Shvachko K, Kuang H, Radia S, et al. The Hadoop distributed file system[C]//Proc of the 26th Symp on Mass Storage Systems and Technologies. Piscataway, NJ: IEEE, 2010: 133−142
    [46]
    Sapio A, Canini M, Ho C Y, et al. Scaling distributed machine learning with in-network aggregation[C]//Proc of the 18th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2021: 785−808
    [47]
    Lao C L, Le Yanfang, Mahajan K, et al. ATP: In-network aggregation for multi-tenant learning[C]//Proc of the 18th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2021: 741−761
    [48]
    Fei Jiawei, Ho C Y, Sahu A N, et al. Efficient sparse collective communication and its application to accelerate distributed deep learning[C]//Proc of the ACM Special Interest Group on Data Communication. New York: ACM, 2021: 676−691
    [49]
    Li Youjie, Liu Iou-Jen, Yuan Yifan, et al. Accelerating distributed reinforcement learning with in-switch computing[C]//Proc of the 46th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2019: 279−291
    [50]
    De Sensi D, Di Girolamo S, Ashkboos S, et al. Flare: Flexible in-network allreduce[C]//Proc of the Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2021: 14−29
    [51]
    Di Girolamo S, Kurth A, Calotoiu A, et al. A RISC-V in-network accelerator for flexible high-performance low-power packet processing[C]//Proc of the 48th Annual Int Symp on Computer Architecture. Piscataway, NJ: IEEE, 2021: 958−971
    [52]
    Li Huancheng, Hao Mingzhe, Novakovic S, et al. LeapIO: Efficient and portable virtual NVMe storage on ARM socs[C]//Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 591−605
    [53]
    Nishtala R, Fugal H, Grimm S, et al. Scaling Memcache at Facebook[C]//Proc of the 10th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2013: 385−398
    [54]
    Weil S A, Brandt S A, Miller E L, et al. Ceph: A scalable, high-performance distributed file system[C]//Proc of the 7th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2006: 307−320
  • Related Articles

    [1]Ma Xiaoxiao, Yang Fan, Wang Zhan, Yuan Guojun, An Xuejun. Survey on Smart Network Interface Card[J]. Journal of Computer Research and Development, 2022, 59(1): 1-21. DOI: 10.7544/issn1000-1239.20200629
    [2]Cai Tao, Wang Jie, Niu Dejiao, Liu Peiyao, Chen Fuli. A High Throughput NVM Storage System Based on Access Request Conflict Detection[J]. Journal of Computer Research and Development, 2020, 57(2): 257-268. DOI: 10.7544/issn1000-1239.2020.20190526
    [3]Tang Yingjie, Wang Fang, Xie Yanwen. An Efficient Failure Reconstruction Based on In-Network Computing for Erasure-Coded Storage Systems[J]. Journal of Computer Research and Development, 2019, 56(4): 767-778. DOI: 10.7544/issn1000-1239.2019.20170834
    [4]Niu Dejiao, He Qingjian, Cai Tao, Wang Jie, Zhan Yongzhao, Liang Jun. APMSS: The New Solid Storage System with Asymmetric Interface[J]. Journal of Computer Research and Development, 2018, 55(9): 2083-2093. DOI: 10.7544/issn1000-1239.2018.20180198
    [5]Zhu Ping. An Access Pass-Through Policy of Storage Unit Under HPC Mass Storage System[J]. Journal of Computer Research and Development, 2013, 50(8): 1667-1673.
    [6]Fu Yingxun, Luo Shengmei, Shu Jiwu. Survey of Secure Cloud Storage System and Key Technologies[J]. Journal of Computer Research and Development, 2013, 50(1): 136-145.
    [7]Lu Youyou, Shu Jiwu. Survey on Flash-Based Storage Systems[J]. Journal of Computer Research and Development, 2013, 50(1): 49-59.
    [8]Zhou Yunxia, Zhao Yuelong, Yang Xi. Disaster Tolerance in Storage System of the Intelligent Network Disk(IND)[J]. Journal of Computer Research and Development, 2012, 49(7): 1587-1592.
    [9]Luo Xianghong and Shu Jiwu. Summary of Research for Erasure Code in Storage System[J]. Journal of Computer Research and Development, 2012, 49(1): 1-11.
    [10]Wan Jiguang and Xie Changsheng. Research and Design of a Cluster Multimedia Storage System[J]. Journal of Computer Research and Development, 2006, 43(8): 1311-1316.
  • Cited by

    Periodical cited type(1)

    1. 刘韵洁,汪硕,黄韬,王佳森. 数算融合网络技术发展研究. 中国工程科学. 2025(01): 1-13 .

    Other cited types(0)

Catalog

    Article views (519) PDF downloads (258) Cited by(1)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return