高级检索

    一种面向解耦合内存的用户态异步数据通路

    A Userspace Asynchronous Data Path for Disaggregated Memory

    • 摘要: 解耦合内存架构正在迅速成为下一代数据中心的重要趋势,有望提高资源利用率和可扩展性.现有系统软件往往采用轻量级线程的高速切换以生成并发的远端数据访问请求,来掩盖解耦合内存的微秒级延迟.然而,这一方法产生了巨大的线程调度和通信开销,消耗了大量本应用于计算的CPU资源,形成了新的性能瓶颈. 针对这一挑战,本文提出了一种面向解耦合内存的用户态异步数据通路.本工作面向数据预取和逐出,摒弃高开销的多线程并发模型,转而利用异步I/O机制实现细粒度任务中计算与网络I/O的交叠.通过结合自适应批处理与动态线程管理策略,最大限度地降低了CPU开销的同时保障了远端内存请求及时性和并发性,从而掩盖解耦合内存访问延迟.我们在AIFM之上实现了原型系统,并在四个典型应用负载下进行了评估.实验结果表明,与AIFM相比,本工作在将数据预取和逐出的CPU开销降低73%的同时,将端到端应用吞吐量平均提升了38%.

       

      Abstract: Disaggregated memory architectures are rapidly emerging as a significant trend for next-generation data centers, promising to enhance resource utilization and scalability. Existing system software often employs rapid context switching between lightweight threads to generate concurrent remote data access requests, thereby masking the microsecond-level latency inherent in disaggregated memory. However, this approach introduces substantial thread scheduling and communication overhead, consuming a significant amount of CPU resources that would otherwise be dedicated to computation and thus creating a new performance bottleneck. To address this challenge, this paper proposes a user-space asynchronous data path for disaggregated memory. This work focuses on data prefetching and eviction, eliminating the high-overhead multi-threaded concurrency model in favor of an asynchronous I/O mechanism to achieve the overlap of computation and network I/O in fine-grained tasks. By combining adaptive batching with dynamic thread management strategies, CPU overhead is minimized while ensuring the timeliness and concurrency of remote memory requests, thereby effectively hiding the latency of disaggregated memory access. We have implemented a prototype system on top of AIFM and evaluated it with four typical application workloads. The results demonstrate that, compared to AIFM, our work reduces the CPU overhead of data prefetching and write-back by 73% while simultaneously increasing the average end-to-end application throughput by 38%.

       

    /

    返回文章
    返回