Abstract:
Disaggregated memory architectures are rapidly emerging as a significant trend for next-generation data centers, promising to enhance resource utilization and scalability. Existing system software often employs rapid context switching between lightweight threads to generate concurrent remote data access requests, thereby masking the microsecond-level latency inherent in disaggregated memory. However, this approach introduces substantial thread scheduling and communication overhead, consuming a significant amount of CPU resources that would otherwise be dedicated to computation and thus creating a new performance bottleneck.
To address this challenge, this paper proposes a user-space asynchronous data path for disaggregated memory. This work focuses on data prefetching and eviction, eliminating the high-overhead multi-threaded concurrency model in favor of an asynchronous I/O mechanism to achieve the overlap of computation and network I/O in fine-grained tasks. By combining adaptive batching with dynamic thread management strategies, CPU overhead is minimized while ensuring the timeliness and concurrency of remote memory requests, thereby effectively hiding the latency of disaggregated memory access. We have implemented a prototype system on top of AIFM and evaluated it with four typical application workloads. The results demonstrate that, compared to AIFM, our work reduces the CPU overhead of data prefetching and write-back by 73% while simultaneously increasing the average end-to-end application throughput by 38%.