Abstract:
Disaggregated memory architectures are rapidly emerging as a significant trend for next-generation data centers, promising to enhance resource utilization and scalability. Existing system software often employs rapid context switching between lightweight threads to generate concurrent remote data access requests, thereby masking the microsecond-level latency inherent in disaggregated memory. However, this approach introduces substantial thread scheduling and communication overhead, consuming a significant amount of CPU resources that would otherwise be dedicated to computation and thus creating a new performance bottleneck. To address this challenge, a userspace asynchronous data path for disaggregated memory is proposed. The proposed data path focuses on data prefetching and eviction, eliminating the high-overhead multi-threaded concurrency model in favor of an asynchronous I/O mechanism to achieve the overlap of computation and network I/O in fine-grained tasks. By combining adaptive batching with dynamic thread management strategies, CPU overhead is minimized while ensuring the timeliness and concurrency of remote memory requests, thereby effectively hiding the latency of disaggregated memory access. A prototype system is implemented on top of AIFM and evaluated with four typical application workloads. Experimental results show that, compared to AIFM, the proposed approach reduces the CPU overhead of data prefetching and eviction by 73% while simultaneously increasing the average end-to-end application throughput by 38%.