高级检索

    面向高速传感数据采集的轻量化嵌入式RDMA软件栈

    Lightweight Embedded RDMA Software Stack for High-Speed Sensor Data Acquisition

    • 摘要: 远程直接内存访问(remote direct memory access,RDMA)技术广泛应用于超算以及智算所需的高带宽和低延迟传输,在带宽、效能和延迟方面与传统TCP/IP协议栈相比具备显著优势。高速传感器通过RDMA技术接入计算网络已逐渐成为工业界共识。然而,在该场景中应用RDMA存在嵌入式传感端资源受限软件栈难以部署的问题。对嵌入式RDMA软件栈轻量化技术展开研究,提出了无高性能CPU和无操作系统条件下基于可靠UDP的轻量化RDMA建链方法和基于可靠以太网的轻量化RDMA驱动,设计了适用于嵌入式传感端的超轻量化RDMA软件栈。实验结果表明,轻量化嵌入式RDMA软件栈相较于标准RDMA软件栈可降低约94%的建链开销和84%的软件驱动开销,嵌入式传感端与标准端基于RDMA技术经由一级交换进行传感数据采集与可靠传输,系统带宽可达98.04 Gbps,端到端延迟低至4.58 μs,系统内标准端CPU开销低至4.2%。在协议兼容性方面,嵌入式RDMA端的READ和WRITE带宽随QP数量和数据量的增加均能够达到标准端的峰值性能,READ和WRITE延迟在数据量较小时保持在μs级水平。

       

      Abstract: Remote direct memory access(RDMA)technology is widely used in high-performance computing and intelligent computing scenarios that require high bandwidth and low latency transmission, offering significant advantages over traditional TCP/IP protocol stacks in terms of bandwidth, efficiency, and latency. It has gradually become an industry consensus for high-speed sensors to access computing networks through RDMA technology. However, deploying RDMA in this scenario faces challenges due to the limited resources of embedded sensor nodes, making it difficult to deploy traditional software stacks. This paper conducts research on lightweight techniques for embedded RDMA software stacks, proposing a lightweight RDMA connection establishment method based on reliable UDP and a lightweight RDMA driver based on reliable Ethernet under conditions without high-performance CPU or operating system. An ultra-lightweight RDMA software stack suitable for embedded sensor nodes is designed. The experimental results show that the lightweight RDMA software stack for embedded systems can reduce the connection establishment overhead by approximately 94% and the software driver overhead by 84% compared with the standard RDMA software stack. The embedded sensor nodes and standard nodes use RDMA technology to perform sensor data acquisition and reliable transmission through one-level switching, with a system bandwidth of up to 98.04 Gbps, end-to-end latency as low as 4.58 μs, and the CPU overhead of standard nodes within the system as low as 4.2%. In terms of protocol compatibility, the READ and WRITE bandwidth of the embedded RDMA nodes can reach the peak performance of the standard nodes as the number of QPs and data volume increase, and the READ and WRITE latency remains at the μs-level when the data volume is small.

       

    /

    返回文章
    返回