• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Huang Mandi, Li Tao, Yang Hui, Li Chenglong, Zhang Yutao, Sun Zhigang. Survey on Ethernet RDMA Network Interface Card[J]. Journal of Computer Research and Development, 2025, 62(5): 1262-1289. DOI: 10.7544/issn1000-1239.202331036
Citation: Huang Mandi, Li Tao, Yang Hui, Li Chenglong, Zhang Yutao, Sun Zhigang. Survey on Ethernet RDMA Network Interface Card[J]. Journal of Computer Research and Development, 2025, 62(5): 1262-1289. DOI: 10.7544/issn1000-1239.202331036

Survey on Ethernet RDMA Network Interface Card

Funds: This work was supported by the Key Laboratory Foundation of State Administration for Science, Technology and Industry for National Defense(WDZC20245250113).
More Information
  • Author Bio:

    Huang Mandi: born in 1999. Master. Her main research interest includes RDMA

    Li Tao: born in 1983. PhD,associate professor. His main research interest includes high-performance network system and chip

    Yang Hui: born in 1986. PhD,associate professor. Her main research interests include RDMA and network processor

    Li Chenglong: born in 1995. PhD, assistant professor. His main research interests include time-sensitive networking, in-network computing, and RISC-V

    Zhang Yutao: born in 2000. Master. His main research interests include RDMA and data center congestion control

    Sun Zhigang: born in 1973. PhD,professor. His main research interests include software-defined network, time-sensitive networking, network architecture, and network security

  • Received Date: December 24, 2023
  • Revised Date: August 26, 2024
  • Accepted Date: October 14, 2024
  • Available Online: October 23, 2024
  • With the rapid expansion of data center and the significant increase in network bandwidth, traditional software network protocol stack has high processor overhead and is difficult to meet the needs of many data center applications in terms of throughput, latency and other aspects. Remote direct memory access(RDMA)technology uses the ideas of zero copy, kernel bypass and processor function offloading to read and write remote host memory data with high bandwidth and low latency. Ethernet-compatible RDMA technology is being applied in data centers, and Ethernet RDMA NIC plays a crucial role in its deployment as the main functional bearer device. This overview analyzes from three aspects: architecture, optimization, and implementation evaluation. 1) We summarize the general architecture of Ethernet RDMA NIC and introduce the key functional components; 2) We focus on the optimization techniques in storage resources, reliable transmission and application-related aspects, including optimization of both connection scalability for NIC cache resources and registration access for host memory resources, optimization of congestion control, flow control and retransmission mechanism for lossy Ethernet to achieve reliable transmission, and optimization of different storage types in distributed storage, database system, cloud storage system, and multi-tenant performance isolation, security and programmability for data center applications; 3) Then we investigate different implementation and evaluation methods. Finally, the summary and outlook are given.

  • [1]
    Hoefler T, Roweth D, Underwood K, et al. Datacenter Ethernet and RDMA: Issues at hyperscale [J]. arXiv preprint, arXiv: 2302.03337, 2023
    [2]
    马潇潇,杨帆,王展,等. 智能网卡综述[J]. 计算机研究与发展,2022,59(1):1−21 doi: 10.7544/issn1000-1239.20200629

    Ma Xiaoxiao, Yang Fan, Wang Zhan, et al. Survey on smart network interface card[J]. Journal of Computer Research and Development, 2022, 59(1): 1−21(in Chinese) doi: 10.7544/issn1000-1239.20200629
    [3]
    Li Qiang, Xiang Qiao, Liu Derui, et al. From RDMA to RDCA: Toward high-speed last mile of data center networks using remote direct cache access [J]. arXiv preprint, arXiv: 2211.05975v2, 2023
    [4]
    He Zhiqiang, Wang Dongyang, Fu Binzhang, et al. MasQ: RDMA for virtual private cloud [C/OL]//Proc of the 2020 ACM SIGCOMM Conf. New York: ACM, 2020 [2024-07-06]. https://dl.acm.org/doi/abs/10.1145/3387514.3405849
    [5]
    Dragojevic A, Narayanan D, Hodson O, et al. FaRM: Fast remote memory [C]//Proc of the 11th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2014: 401−414
    [6]
    Le Yanfang, Stephens B, Singhvi A, et al. RoGUE: RDMA over generic unconverged Ethernet [C]//Proc of the ACM Symp on Cloud Computing. New York: ACM, 2018: 225−236
    [7]
    Mittal R, Shpiner A, Panda A, et al. Revisiting network support for RDMA [C]//Proc of the 2018 ACM SIGCOMM Conf. New York: ACM, 2018: 313−326
    [8]
    Hoefler T, Girolamo S D, Taranov K, et al. sPIN: High-performance streaming processing in the network [C/OL]//Proc of the Int Conf for High Performance Computing, Networking, Storage and Analysis. New York: ACM, 2017 [2024-07-06]. https://ieeexplore.ieee.org/document/9926279
    [9]
    Guo Chuanxiong, Wu Haitao, Deng Zhong, et al. RDMA over commodity Ethernet at scale [C]//Proc of the 2016 ACM SIGCOMM Conf. New York: ACM, 2016: 202−215
    [10]
    IEEE 802.1 Working Group. 802.1Qbb – priority-based flow control [EB/OL]. [2024-07-06]. https://1.ieee802.org/dcb/802-1qbb/
    [11]
    Internet Engineering Task Force. RFC3168: The addition of explicit congestion notification (ECN) to IP [EB/OL]. [2024-07-06]. https://dl.acm.org/doi/book/10.17487/RFC3168
    [12]
    Kong Xinhao, Zhu Yibo, Zhou Huaping, et al. Collie: Finding performance anomalies in RDMA subsystems [C]//Proc of the 19th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2022: 287−305
    [13]
    Tang Jian, Wang Xiaoliang, Dai Huichen. Scalable RDMA transport with efficient connection sharing [C/OL]//Proc of the IEEE Int Conf on Computer Communications. Piscataway, NJ: IEEE, 2023 [2024-07-06]. https://ieeexplore.ieee.org/document/10228968
    [14]
    Wang Zilong, Luo Layong, Ning Qingsong, et al. SRNIC: A scalable architecture for RDMA NICs [C/OL]//Proc of the 20th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2023 [2024-07-06]. https://www.usenix.org/ conference/nsdi 23/presentation/wang-zilong
    [15]
    Intel. Intel E810 Ethernet network adapter [EB/OL]. [2024-07-06]. https://www.intel.cn/content/www/cn/zh/products/details/ethernet/800-network-adapters/e810-network-adapters/products.html
    [16]
    NVIDIA. NVIDIA ConnectX−7 network adapter datasheet [EB/OL]. [2024-07-06]. https://nvdam.widen.net/s/csf8rmnqwl/infiniband-ethernet-datasheet-connectx-7-ds-nv-us-2544471
    [17]
    MARVELL. MARVELL converged Ethernet network adapter[EB/OL]. [2024-07-06]. https://cn.marvell.com/products/ethernet-adapters-and-controllers/fastlinq-cna-adapters/documents.html
    [18]
    Huawei Hisilicon. Huawei SP600 NIC [EB/OL]. [2024-07-06]. https://support.huawei.com/enterprise/zh/doc/EDOC1100309168?idPath=23710424%7C251364409%7C21782478%7C15791
    [19]
    Chelsio. Chelsio T7 DPU products [EB/OL]. [2024-07-06]. https://www.chelsio.com/wp-content/uploads/resources/t7-dpu-asic.pdf
    [20]
    Broadcom. Broadcom BCM957508-P1200G NIC datasheet [EB/OL]. [2024-07-06]. https://docs.broadcom.com/doc/957508-P1200G-DS1XX
    [21]
    Singhvi A, Akella A, Gibson D, et al. 1RMA: Re-envisioning remote memory access for multi-tenant datacenters [C]//Proc of the 2020 ACM SIGCOMM Conf. New York: ACM, 2020: 708−721
    [22]
    Vamanan B, Hasan J, Vijaykumar T N. Deadline-aware datacenter tcp (D2TCP)[J]. ACM SIGCOMM Computer Communication Review, 2012, 42(4): 115−126 doi: 10.1145/2377677.2377709
    [23]
    Marty M, Kruijf M D, Adriaens J, et al. Snap: A microkernel approach to host networking [C]//Proc of the 27th ACM Symp on Operating Systems Principles. New York: ACM, 2019: 399−413
    [24]
    Kumar G, Dukkipati N, Jang K, et al. Swift: Delay is simple and effective for congestion control in the datacenter [C]//Proc of the 2020 ACM SIGCOMM Conf. New York: ACM, 2020: 514−528
    [25]
    Mittal R, Lam V T, Dukkipati N, et al. TIMELY: RTT-based congestion control for the datacenter [C]//Proc of the 2015 ACM SIGCOMM Conf. New York: ACM, 2015: 537−550
    [26]
    Zhang Qizhen, Bernstein P A, Berger D S, et al. Redy: Remote dynamic memory cache [J]. Very Large Data Base Endowment, 2021, 15(4): 766−779
    [27]
    Firestone D, Putnam A, Mundkur S, et al. Azure accelerated networking: SmartNICs in the public cloud [C]//Proc of the 15th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2018: 51−66
    [28]
    Kandula S, Sengupta S, Greenberg A, et al. The nature of data center traffic: Measurements & analysis [C]//Proc of the 9th ACM SIGCOMM Conf on Internet Measurement. New York: ACM, 2009: 202−208
    [29]
    Zhu Yibo, Ghobadi M, Misra V, et al. ECN or delay: Lessons learnt from analysis of DCQCN and TIMELY [C]//Proc of the 12th Int Conf on Emerging Networking Experiments and Technologies. New York: ACM, 2016: 313−327
    [30]
    Lu Yuanwei, Chen Guo, Ruan Zhenyuan, et al. Memory efficient loss recovery for hardware-based transport in datacenter [C]//Proc of the 1st Asia-Pacific Workshop on Networking. New York: ACM, 2017: 22−28
    [31]
    Dragojevic A, Narayanan D, Castro M. RDMA reads: To use or not to use[J]. IEEE Data Engineering Bulletin, 2017, 40(1): 3−14
    [32]
    Bai Wei, Abdeen S S, Agrawal, et al. Empowering Azure storage with RDMA [C]//Proc of the 20th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2023: 49−67
    [33]
    Li Bojie, Ruan Zhenyuan, Xiao Wencong, et al. KV-Direct: High-performance in-memory key-value store with programmable NIC [C]//Proc of the 26th Symp on Operating Systems Principles. New York: ACM, 2017: 137−152
    [34]
    Lu Yuanwei, Chen Guo, Li Bojie, et al. Multi-path transport for RDMA in Datacenters [C]//Proc of the 15th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2018: 357−371
    [35]
    Li Bojie, Cui Tianyi, Wang Zibo, et al. Socksdirect: Datacenter sockets can be fast and compatible [C]//Proc of the 2019 ACM SIGCOMM Conf. New York: ACM, 2019: 90−103
    [36]
    Sidler D, Wang Z, Chiosa M, et al. StRoM: Smart remote memory[C/OL]//Proc of the 15th European Conf on Computer Systems. New York: ACM, 2020 [2024-07-06]. https://dl.acm.org/doi/abs/10.1145/3342195.3387519
    [37]
    Li Qiang, Gao Yixiao, Wang Xiaoliang, et al. Flor: An open high performance RDMA framework over heterogeneous RNICs [C]//Proc of the 17th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2023: 931−948
    [38]
    Gao Yixiao, Li Qiang, Tang Lingbo, et al. When cloud storage meets RDMA [C]//Proc of the 18th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2021: 519−533
    [39]
    Li Yuliang, Miao Rui, Liu H H, et al. HPCC: High precision congestion control [C]//Proc of the 2019 ACM SIGCOMM Conf. New York: ACM, 2019: 44−58
    [40]
    Liu Kefei, Jiang Zhuo, Zhang Jiao, et al. Hostping: Diagnosing intra-host network bottlenecks in RDMA servers [C]//Proc of the 20th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2023: 15−29
    [41]
    Chen Youmin, Lu Youyou, Shu Jiwu. Scalable RDMA RPC on reliable connection with efficient resource sharing [C/OL]//Proc of the 14th EuroSys Conf. New York: ACM, 2019 [2024-07-06]. https://dl.acm.org/do i/10.1145/3302424.3303968
    [42]
    Miao Mao, Ren Fengyuan, Luo Xiaohui, et al. SoftRDMA: Rekindling high performance software RDMA over commodity ethernet [C]//Proc of the 1st Asia-Pacific Workshop on Networking. New York: ACM, 2017: 43−49
    [43]
    Ma Teng, Chen Kang, Ma Shaonan, et al. Thinking more about RDMA memory semantics [C]//Proc of the IEEE Int Conf on Cluster Computing. Piscataway, NJ: IEEE, 2021: 456−467
    [44]
    Cheng Wenxue, Qian Kun, Jiang Wanchun, et al. Re-architecting congestion management in lossless Ethernet [C]//Proc of the 17th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2020: 19−36
    [45]
    Ma Shaonan, Ma Teng, Chen Kang, et al. A survey of storage systems in the RDMA era[J]. IEEE Transactions on Parallel Distributed Systems, 2022, 33(12): 4395−4409 doi: 10.1109/TPDS.2022.3188656
    [46]
    Lu Youyou, Shu Jiwu, Chen Youmin, et al. Octopus+: An RDMA-enabled distributed persistent memory file system[J]. ACM Transactions Storage, 2017, 17(3): 1−25
    [47]
    Ma Tengyu, Ma Tao, Song Zhuo, et al. X-RDMA: Effective RDMA middleware in large-scale production environments [C/OL]//Proc of the IEEE Int Conf on Cluster Computing. Piscataway, NJ: IEEE, 2019 [2024-07-06]. https://ieeexplore.ieee.org/document/8891004
    [48]
    Shen Dian, Luo Junzhou, Dong Fang, et al. Distributed and optimal RDMA resource scheduling in shared data center networks [C]//Proc of the IEEE Int Conf on Computer Communications. Piscataway, NJ: IEEE, 2020: 606−615
    [49]
    Wang Xiaoliang, Song Hexiang, Nguyen C, et al. Maximizing the benefit of RDMA at end hosts [C/OL]//Proc of the IEEE Int Conf on Computer Communications. Piscataway, NJ: IEEE, 2021 [2024-07-06]. https://ieeexplore.ieee.org/document/9488875
    [50]
    Qiu Haonan, Wang Xiaoliang, Jin Tianchen, et al. Toward effective and fair RDMA resource sharing [C]//Proc of the 2nd Asia-Pacific Workshop on Networking. New York: ACM, 2018: 8−14
    [51]
    Yu Peiwen, Xue Feiyang, Tian Chen, et al. Bifrost: Extending RoCE for long distance inter-DC links[C/OL] //Proc of the 31st IEEE Int Conf on Network Protocols. Piscataway, NJ: IEEE, 2023 [2024-07-06]. https://ieeexplore.ieee.org/document/10355634
    [52]
    He Zhiqiang, Chen Yuxin, Hua Bei. RoUD: Scalable RDMA over UD in lossy data center networks [C]//Proc of the 23rd IEEE/ACM Int Symp on Cluster, Cloud Internet Computing. Piscataway, NJ: IEEE, 2023: 36−46
    [53]
    Wang Dongyang, Fu Binzhang, Lu Gang, et al. vSocket: Virtual socket interface for RDMA in public clouds [C]//Proc of the 15th ACM SIGPLAN/SIGOPS Int Conf on Virtual Execution Environments. New York: ACM, 2019: 179−192
    [54]
    Zang Dawei, Cao Zheng, Liu Xiaoli, et al. PROP: Using PCIe-based RDMA to accelerate rack-scale communications in data centers [C]//Proc of the 21st IEEE Int Conf on Parallel Distributed Systems. Piscataway, NJ: IEEE, 2015: 465−472
    [55]
    Liao Yunkun, Wu Jingya, Lu Wenyan, et al. Optimize the TX architecture of RDMA NIC for performance isolation in the cloud environment [C]// Proc of the Great Lakes Symp on VLSI. New York: ACM, 2023: 29−35
    [56]
    Han Shukai, Zhang Mi, Jiang Dejun, et al. HiStore: Rethinking hybrid index in RDMA-based key-value store [J]. arXiv preprint, arXiv: 2208. 12987, 2022
    [57]
    Ma Xiaoxiao, Yang Fan, Wang Zhan, et al. A scalable RDMA network interface card with efficient cache management [C/OL]//Proc of the IEEE Int Symp on Circuits Systems. Piscataway, NJ: IEEE, 2023 [2024-07-06]. https://ieeexplore.ieee.org/abstract/document/10181426
    [58]
    Kang Ning, Wang Zhan, Yang Fan, et al. csRNA: Connection-scalable RDMA NIC architecture in datacenter environment [C]//Proc of the 40th IEEE Int Conf on Computer Design. Piscataway, NJ: IEEE, 2022: 398−406
    [59]
    Wang Xizheng, Chen Guo, Yin Xijin, et al. StaR: Breaking the scalability limit for RDMA [C/OL]//Proc of the 29th IEEE Int Conf on Network Protocols. Piscataway, NJ: IEEE, 2021 [2024-07-06]. https://ieeexplore.ieee.org/document/9651935
    [60]
    Wang Zilong, Wan Xinchen, Zeng Chaoliang, et al. Accurate and scalable rate limiter for RDMA NICs [C]//Proc of the 7th Asia-Pacific Workshop on Networking. New York: ACM, 2023: 15−20
    [61]
    Rothenberger B, Taranov K, Perrig A, et al. ReDMArk: Bypassing RDMA security mechanisms [C]//Proc of the 30th USENIX Security Symp. Berkeley, CA: USENIX Association, 2021: 4277−4292
    [62]
    Taranov K, Rothenberger B, Perrig A, et al. sRDMA-Efficient NIC-based authentication and encryption for remote direct memory access [C]//Proc of the USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2020: 691−704
    [63]
    Barthels C, Alonso G, Hoefler T. Designing databases for future high-performance networks[J]. IEEE Data Engineering Bulletin, 2017, 40(1): 15−26
    [64]
    Taranov K, Girolamo S D, Hoefler T. CoRM: Compactable remote memory over RDMA [C]//Proc of the Int Conf on Management of Data. New York: ACM, 2021: 1811−1824
    [65]
    Xue Jiachen, Chaudhry M U, Vamanan B, et al. Dart: Divide and specialize for fast response to congestion in RDMA-based datacenter networks[J]. IEEE/ACM Transactions on Networking, 2018, 28(1): 322−335
    [66]
    Tsai S Y, Zhang Yiying. LITE kernel RDMA support for datacenter applications [C]//Proc of the 26th Symp on Operating Systems Principles. New York: ACM, 2017: 306−324
    [67]
    Xue Jiachen, Vijaykumar T N, Thottethodi M. Network interface architecture for remote indirect memory access (RIMA) in datacenters[J]. ACM Transactions on Architecture Code Optimization, 2020, 17(2): 1−22
    [68]
    Taheri P, Menikkumbura D, Vanini E, et al. RoCC: Robust congestion control for RDMA [C]//Proc of the 16th Int Conf on Emerging Networking EXperiments Technologies. New York: ACM, 2020: 17−30
    [69]
    Le Yanfang, Malekpourshahraki M, Stephens B, et al. On the impact of cluster configuration on RoCE application design [C]//Proc of the 3rd Asia-Pacific Workshop on Networking. New York: ACM, 2019: 64−70
    [70]
    Kalia A, Kaminsky M, Andersen D G. FaSST: Fast, scalable and simple distributed transactions with two-sided (RDMA) datagram RPCs [C]//Proc of the 12th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2016: 185−201
    [71]
    Kalia A, Kaminsky M, Andersen D G. Using RDMA efficiently for key-value services [C]//Proc of the 2014 ACM SIGCOMM Conf. New York: ACM, 2014: 295−306
    [72]
    Kim D, Memaripour A, Badam A, et al. HyperLoop: Group-based NIC-offloading to accelerate replicated transactions in multi-tenant storage systems [C]//Proc of the 2018 ACM SIGCOMM Conf. New York: ACM, 2018: 297−312
    [73]
    Kim D, Yu Tianlong, Liu H H, et al. FreeFlow: Software-based virtual RDMA networking for containerized clouds [C]//Proc of the 16th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2019: 113−126
    [74]
    Kalia A, Kaminsky M, Andersen D G. Design guidelines for high performance RDMA systems [C]//Proc of the USENIX Conf on Usenix Annual Technical Conf. Berkeley, CA: USENIX Association, 2016: 437−450
    [75]
    Openfabrics. Libibverbs release [EB/OL]. [2024-07-06]. https://www.openfabrics.org/downloads/libibverbs/
    [76]
    Wei Xingda, Dong Zhiyuan, Chen Rong, et al. Deconstructing RDMA-enabled distributed transactions: Hybrid is better [C]//Proc of the 13th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2018: 233−251
    [77]
    Zhang Yiwen, Gu Juncheng, Lee Youngmoon, et al. Performance isolation anomalies in RDMA [C]//Proc of the Workshop on Kernel-Bypass Networks. New York: ACM, 2017: 43−48
    [78]
    Intel. Intel I350 Ethernet server adapter [EB/OL]. [2024-07-06]. https://www.intel.cn/content/www/cn/zh/products/details/ethernet/gigabit-network-adapters/i350-server-adapters.html
    [79]
    Xilinx. Xilinx embedded target RDMA enabled v1.1 product guide [EB/OL]. [2024-07-06]. https://docs.xilinx.com/v/u/en-US/pg294-etrnic
    [80]
    Xilinx. Xilinx embedded RDMA enabled NIC LogiCORE IP product guide [EB/OL]. [2024-07-06]. https://docs.xilinx.com/r/en-US/pg332-ernic
    [81]
    Schelten N, Steinert F, Knapheide J, et al. A high-throughput, resource-efficient implementation of the RoCEv2 remote DMA protocol and its application[J]. ACM Transactions on Reconfigurable Technology, 2022, 16(1): 1−23
    [82]
    Kong Xinhao, Chen Jingrong, Bai Wei, et al. Understanding RDMA microarchitecture resources for performance isolation [C]//Proc of the 20th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2023: 31−48
    [83]
    IBTA. InfiniBand architecture specification [EB/OL]. [2024-07-06]. https://www.infinibandta.org/ibta-specification/
    [84]
    Guo Zehua, Liu Sen, Zhang Zhili. Traffic control for RDMA-enabled data center networks: A survey[J]. IEEE Systems Journal, 2020, 14(1): 677−688 doi: 10.1109/JSYST.2019.2936519
    [85]
    Ryser A, Lerner A, Forencich A, et al. D-RDMA: Bringing zero-copy RDMA to database systems [C/OL]//Proc of the 12th Annual Conf on Innovative Data Systems Research. Chaminade, CA: CIDR, 2022 [2024-07-06]. https://www.cidrdb.org/cidr2022/papers/p77-ryser.pdf
    [86]
    Zhu Yibo, Eran H, Firestone D, et al. Congestion control for large-scale RDMA deployments [C]//Proc of the 2015 ACM SIGCOMM Conf. New York: ACM, 2015: 523−536
    [87]
    Tang Jian, Xu Tingting, Nguyen C, et al. Tuning target delay for RTT-based congestion control [C/OL]//Proc of the 30th IEEE Int Conf on Network Protocols. Piscataway, NJ: IEEE, 2022 [2024-07-06]. https://ieeexplore.ieee.org/document/9940420
    [88]
    Fuhrer B, Shpigelman Y, Tessler C, et al. Implementing reinforcement learning datacenter congestion control in NVIDIA NICs [C]//Proc of the 23rd IEEE/ACM Int Symp on Cluster, Cloud and Internet Computing. Piscataway, NJ: IEEE, 2023: 331−343
    [89]
    Perry J, Ousterhout A, Balakrishnan H, et al. Fastpass: A centralized "zero-queue" datacenter network[J]. ACM SIGCOMM Computer Communication Review, 2014, 44(4): 307−318
    [90]
    Cho I, Jang K, Han Dongsu. Credit-scheduled delay-bounded congestion control for datacenters [C]//Proc of the 2017 ACM SIGCOMM Conf. New York: ACM, 2017: 239−252
    [91]
    NVIDIA. Introduction to resilient RoCE [EB/OL]. [2024-07-06]. https://enterprise-support.nvidia.com/s/article/introduction-to-resilient-roce---faq#jive_content_id_What_is_Resilient_RoCE
    [92]
    Mitchell C, Geng Yifeng, Jinyang Li. Using one-sided RDMA reads to build a fast, CPU-efficient key-value store [C]//Proc of the USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2013: 103−114
    [93]
    Yang Jian, Izraelevitz J, Swanson S. Orion: A distributed file system for non-volatile main memory and RDMA-capable networks [C]//Proc of the 17th USENIX Conf on File and Storage Technologies. Berkeley, CA: USENIX Association, 2019: 221−234
    [94]
    Yang Jian, Izraelevitz J, Swanson S. FileMR: Rethinking RDMA networking for scalable persistent memory [C]//Proc of the 17th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2020: 111−125
    [95]
    Zamanian E, Yu Xiangyao, Stonebraker M, et al. Rethinking database high availability with RDMA networks[J]. Very Large Data Base Endowment, 2019, 12(11): 1637−1650
    [96]
    Miao Rui, Zhu Lingjun, Ma Shu, et al. From Luna to Solar: The evolutions of the compute-to-storage networks in Alibaba cloud [C]//Proc of the 2022 ACM SIGCOMM Conf. New York: ACM, 2022: 753−766
    [97]
    Zhu Lingjun, Shen Yifan, Xu Erci, et al. Deploying user-space TCP at cloud scale with LUNA [C]//Proc of the USENIX Annual Technical Conf. Berkeley, CA: USENIX Association, 2023: 673−687
    [98]
    Broadcom VMware. VMware technical journal: Toward a paravirtual vRDMA device for VMware ESXi guests [R]. Palo Alto, CA: VMware, 2012: 22−27
    [99]
    Pfefferle J, Stuedi P, Trivedi A K, et al. A hybrid I/O virtualization framework for RDMA-capable network interfaces [C]//Proc of the 11th ACM SIGPLAN/SIGOPS Int Conf on Virtual Execution Environments. New York: ACM, 2015: 17−30
    [100]
    Zhang Yiwen, Tan Yue, Stephens B E, et al. Justitia: Software multi-tenancy in hardware kernel-bypass networks [C]//Proc of the 19th USENIX Symp on Networked Systems Design and Implementation. Berkeley, CA: USENIX Association, 2022: 1307−1326
    [101]
    Amaro E, Luo Zhihong, Ousterhout A, et al. Remote memory calls [C]//Proc of the 19th ACM Workshop on Hot Topics in Networks. New York: ACM, 2020: 38−44
    [102]
    Shalev L, Ayoub H, Bshara N, et al. A cloud-optimized transport protocol for elastic and scalable HPC[J]. IEEE Micro, 2020, 40(6): 67−73 doi: 10.1109/MM.2020.3016891
    [103]
    Aliyun. eRDMA [EB/OL]. [2024-07-06]. https://github.com/alibaba/elastic-rdma-drivers
    [104]
    Xing Jiarong, Hsu K, Qiu Yiming, et al. Bedrock: Programmable network support for secure RDMA systems [C]//Proc of the 31st USENIX Security Symp. Berkeley, CA: USENIX Association, 2022: 2585−2600
    [105]
    Google. Falcon [EB/OL]. [2024-07-06]. https://cloud.google.com/blog/topics/systems/introducing-falcon-a-reliable-low-latency-hardware-transport
    [106]
    Ultra Ethernet. UET [EB/OL]. [2024-07-06]. https://ultraethernet.org/wp-content/uploads/sites/20/2023/10/23.07.12-UEC-1.0-Overview-FINAL-WITH-LOGO.pdf
    [107]
    陈游旻,陆游游,罗圣美,等. 基于RDMA的分布式存储系统研究综述[J]. 计算机研究与发展,2019,56(2):227−239

    Chen Youmin, Lu Youyou, Luo Shengmei, et al. Survey on RDMA-based distributed storage systems[J]. Journal of Computer Research and Development, 2019, 56(2): 227−239 (in Chinese)
    [108]
    Openfabrics. Dynamically-connected transport service [EB/OL]. [2024-07-06]. https://www.openfabrics.org/images/eventpresos/workshops2014/DevWorkshop/presos/Monday/pdf/05_DC_Verbs.pdf
    [109]
    Matthew J K, Jaidev K S, Dhabaleswar K P. Scalable MPI design over InfiniBand using extended reliable connection [C]//Proc of the IEEE Int Conf on Cluster Computing. Piscataway, NJ: IEEE, 2008: 203−212
    [110]
    杜鑫乐,徐恪,李彤,等. 数据中心网络的流量控制:研究现状与趋势[J]. 计算机学报,2021,44(7):1287−1309

    Du Xinle, Xu Ke, Li Tong, et al. Traffic control for data center network: State of the art and future research[J]. Chinese Journal of Computers, 2021, 44(7): 1287−1309 (in Chinese)
    [111]
    刘敬玲,黄家玮,蒋万春,等. 数据中心负载均衡方法研究综述[J]. 软件学报,2021,32(2):300−326

    Liu Jingling, Huang Jiawei, Jiang Wanchun, et al. Survey on load balancing mechanism in data center[J]. Journal of Software, 2021, 32(2): 300−326 (in Chinese)
    [112]
    Xu Lisong, Harfoush K, Rhee Injong. Binary increase congestion control (BIC) for fast long-distance networks [C]//Proc of the IEEE Int Conf on Computer Communications. Piscataway, NJ: IEEE, 2004: 2514−2524
    [113]
    Ha Sangtae, Rhee Injong, Xu Lisong. CUBIC: A new TCP-friendly high-speed TCP variant[J]. ACM Special Interest Group on Operating Systems, 2008, 42(5): 64−74 doi: 10.1145/1400097.1400105
    [114]
    Alizadeh M, Greenberg A G, Maltz D A, et al. Data center TCP (DCTCP)[J]. ACM SIGCOMM Computer Communication Review, 2010, 40(4): 63−74 doi: 10.1145/1851275.1851192
    [115]
    IEEE 802.1 Working Group. 802.1Qau-congestion notification [EB/OL]. [2024-07-06]. https://1.ieee802.org/dcb/802-1qau/
    [116]
    Tessler C, Shpigelman Y, Dalal G, et al. Reinforcement learning for datacenter congestion control[J]. ACM SIGMETRICS Performance Evaluation Review, 2021, 49(2): 43−46
    [117]
    章淼,吴建平,林闯. 互联网端到端拥塞控制综述[J]. 软件学报,2002,13(3):354−363

    Zhang Miao, Wu Jianping, Lin Chuang. A review of research on end-to-end congestion control for the Internet[J]. Journal of Software, 2002, 13(3): 354−363 (in Chinese)
    [118]
    Dally W J, Seitz C L. Deadlock-free message routing in multiprocessor interconnection networks[J]. IEEE Transactions on Computers, 1987, 36(5): 547−553
    [119]
    Yu Zhuolong, Su BoWei, Bai Wei, et al. Understanding the micro-behaviors of hardware offloaded network stacks with Lumina [C]//Proc of the 2023 ACM SIGCOMM Conf. New York: ACM, 2023: 1074−1087
    [120]
    RedHat. What is cloud storage? [EB/OL]. [2024-07-06]. https://www.redhat.com/zh/topics/data-storage/what-is-cloud-storage
    [121]
    陈思新. 分布式块存储系统中RDMA通信优化研究 [D]. 武汉:华中科技大学,2022

    Chen Sixin. Optimization study of RDMA communication in distributed block storage systems [D]. Wuhan: Huazhong University of Science and Technology, 2022 (in Chinese)
    [122]
    IBM. What is file storage? [EB/OL]. [2024-07-06]. https://www.ibm.com/cn-zh/topics/file-storage
    [123]
    Azure. Azure blockchain service [EB/OL]. [2024-07-06]. https://azure.microsoft.com/en-us/services/blockchain-service/
    [124]
    The Ohio State University. High-performance big data project [EB/OL]. [2024-07-06]. http://hibd:cse:ohio-state:edu/
    [125]
    Carbone P, Katsifodimos A, Ewen S, et al. Apache Flink™: Stream and batch processing in a single engine[J]. IEEE Computer Society, 2015, 36(4): 28−33
    [126]
    Li Hao, Kadav A, Kruus E, et al. Malt: Distributed data-parallelism for existing ml applications [C/OL]//Proc of the 10th European Conf on Computer Systems. New York: ACM, 2015 [2024-07-06]. https://dl.acm.org/doi/10.1145/2741948.2741965
    [127]
    Abadi M, Barham P, Chen Jianmin, et al. Tensorflow: A system for large-scale machine learning [C]//Proc of the 12th USENIX Symp on Operating Systems Design and Implementation. Berkeley, CA: USENIX Association, 2016: 265−283
    [128]
    Peripheral Component Interconnect Special Interest Group. SR-IOV [EB/OL]. [2024-07-06]. https://pcisig.com/specifications/iov/single_root/
    [129]
    Ultra Ethernet. UEC [EB/OL]. [2024-07-06]. https://ultraethernet.org/
    [130]
    AMD Pensando. DSC2−200 distributed services card [EB/OL]. [2024-07-06]. https://www.amd.com/system/files/documents/pensando-dsc-200-product-brief.pdf
    [131]
    NVIDIA. BlueField−2DPU [EB/OL]. [2024-07-06]. https://www.nvidia.com/content/dam/en-zz/Solutions/Data-Center/documents/datasheet-nvidia-bluefield-3-dpu.pdf
    [132]
    Aliyun. eRDMA [EB/OL]. [2024-07-06]. https://help.aliyun.com/zh/ecs/user-guide/erdma/?spm=a2c4g.11186623.0.0.52db44bbxNFCyy
    [133]
    Amazon . ENA express: Improved network latency and per-flow performance on EC2 [EB/OL]. [2024-07-06]. https://aws.amazon.com/cn/blogs/aws/new-ena-express-improved-network-latency-and-per-flow-performance-on-ec2/
    [134]
    Amazon. AWS Nitro system [EB/OL]. [2024-07-06]. https://aws.amazon.com/cn/ec2/nitro
    [135]
    Pandey S M, Shashidhara R. SRoCE: Software RDMA over commodity Ethernet [EB/OL]. 2020 [2024-07-06]. https://homes.cs.washington.edu/~rajaths/sRoCE.pdf
    [136]
    Liss L. The Linux softROCE driver [EB/OL]. [2024-07-06]. https://www.openfabrics.org/images/eventpresos/2017presentations/205_SoftRoCE_LLiss.pdf
    [137]
    Linux. softiWARP [EB/OL]. [2024-07-06]. https://lxr.linux.no/linux+v5.15.91/drivers/infiniband/sw/siw/
    [138]
    Chelsio. Chelsio T6 [EB/OL]. [2024-07-06]. https://www.chelsio.com/wp-content/uploads/resources/T6-Architecture.pdf
    [139]
    The Transaction Processing Council. TPC-C benchmark v5.11 [EB/OL]. [2024-07-06]. http://www.tpc.org/tpcc/
    [140]
    The H-Store Team. SmallBank benchmark [EB/OL]. [2024-07-06]. http://hstore.cs.brown.edu/documentation/deployment/benchmarks/smallbank/
    [141]
    Huang Haoju, Ghandeharizadeh S. An evaluation of RDMA-based message passing protocols [C]//Proc of the IEEE Int Conf on Big Data. Piscataway, NJ: IEEE, 2019: 3340−3349
    [142]
    Hong Yuju, Thottethodi M. Understanding and mitigating the impact of load imbalance in the memory caching tier [C/OL]//Proc of the 4th Annual Symp on Cloud Computing. New York: ACM, 2013 [2024-07-06]. https://dl.acm.org/doi/10.1145/2523616.2525970
    [143]
    Redis. Redis open source in-memory data store [EB/OL]. [2024-07-06]. https://redis.io/
    [144]
    CloudLab. CloudLab infrastructure for cloud computing [EB/OL]. [2024-07-06]. https://cloudlab.us/
    [145]
    OFED. Linux perftest [EB/OL]. [2024-07-06]. https://github.com/linux-rdma/perftest
    [146]
    NVIDIA Mellanox. NEO-Host [EB/OL]. [2024-07-06]. https://www.nvidia.cn/networking/management-software/
    [147]
    Ohio Supercomputer Center. Linux qperf [EB/OL]. [2024-07-06]. https://github.com/linux-rdma/qperf
    [148]
    Ohio State University. OSU benchmarks [EB/OL]. [2024-07-06]. https://mvapich.cse.ohio-state.edu/benchmarks/
    [149]
    Carnegie Mellon University. rdma_bench [EB/OL]. [2024-07-06]. https://github.com/efficient/rdma_bench
    [150]
    NS−3. The NS−3 discrete-event network simulator [EB/OL]. [2024-07-06]. http://www.nsnam.org
    [151]
    OMNeT++. OMNeT++ discrete event simulator [EB/OL]. [2024-07-06]. https://omnetpp.org/
    [152]
    Huang Bo, Jin Li, Lu Zhihui, et al. BoR: Toward high-performance permissioned blockchain in RDMA-enabled network[J]. IEEE Transactions on Services Computing, 2020, 13(2): 301−313 doi: 10.1109/TSC.2019.2948009
    [153]
    Ren Yufei, Wu Xingbo, Zhang Li, et al. iRDMA: Efficient use of RDMA in distributed deep learning systems [C]//Proc of the 19th IEEE Int Conf on High Performance Computing and Communications. Los Alamitos, CA: IEEE Computer Society, 2017: 231−238
    [154]
    Cossettini A, Taranov K, Vogt C, et al. A RDMA interface for ultra-fast ultrasound data-streaming over an optical link [C]//Proc of the Design, Automation & Test in Europe Conf & Exhibition. Leuven, BEL: European Design and Automation Association, 2022: 80−83
    [155]
    De Laat W. The application of RDMA over converged Ethernet data transport for radio-astronomy systems [D]. Delft, Netherlands: Delft University of Technology, 2022
  • Related Articles

    [1]Zeng Gaoxiong, Hu Shuihai, Zhang Junxue, Chen Kai. Transport Protocols for Data Center Networks: A Survey[J]. Journal of Computer Research and Development, 2020, 57(1): 74-84. DOI: 10.7544/issn1000-1239.2020.20190519
    [2]He Rongxi, Lei Tianying, Lin Ziwei. Multi-Constrained Energy-Saving Routing Algorithm in Software-Defined Data Center Networks[J]. Journal of Computer Research and Development, 2019, 56(6): 1219-1230. DOI: 10.7544/issn1000-1239.2019.20180029
    [3]Xu Gang, Wang Zhan, Zang Dawei, An Xuejun. Anomaly Detection Algorithm of Data Center Network Based on LSDB[J]. Journal of Computer Research and Development, 2018, 55(4): 815-830. DOI: 10.7544/issn1000-1239.2018.20160970
    [4]Shao En, Yuan Guojun, Huan Zhixuan, Cao Zheng, Sun Ninghui. A Sliced Multi-Rail Interconnection Network for Large-Scale Clusters[J]. Journal of Computer Research and Development, 2017, 54(11): 2534-2546. DOI: 10.7544/issn1000-1239.2017.20151069
    [5]Wang Binfeng, Su Jinshu, Chen Lin. Review of the Design of Data Center Network for Cloud Computing[J]. Journal of Computer Research and Development, 2016, 53(9): 2085-2106. DOI: 10.7544/issn1000-1239.2016.20150962
    [6]Dong Shi, Li Ruixuan, Li Xiaolin. Energy Efficient Routing Algorithm Based on Software Defined Data Center Network[J]. Journal of Computer Research and Development, 2015, 52(4): 806-812. DOI: 10.7544/issn1000-1239.2015.20148419
    [7]Lu Feifei, Luo Xingguo, Xie Xianghui, Zhu Guiming, Pu Xiaochuan. Constant Degree Network for Massively Data Center[J]. Journal of Computer Research and Development, 2014, 51(11): 2437-2447. DOI: 10.7544/issn1000-1239.2014.20130165
    [8]Zhu Guiming, Xie Xianghui, Guo Deke, Lu Feifei, Tao Zhirong. DCent: A High Extensible Data Center Networking Structure Using Dual-port Servers[J]. Journal of Computer Research and Development, 2014, 51(5): 1009-1017.
    [9]Deng Gang, Gong Zhenghu, and Wang Hong. Characteristics Research on Modern Data Center Network[J]. Journal of Computer Research and Development, 2014, 51(2): 395-407.
    [10]Wang Cong, Wang Cuirong, Wang Xingwei, Jiang Dingde. Network Architecture Design for Data Centers Towards Cloud Computing[J]. Journal of Computer Research and Development, 2012, 49(2): 286-293.

Catalog

    Article views (619) PDF downloads (346) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return