Abstract:
Existing in-memory object caching systems are bottlenecked by the latency overhead of traditional Ethernet and the limited DRAM amount within the servers. Modern high-performance I/O technologies such as RDMA and NVMe provide a promising solution to address such challenges. In this paper, we focus on the data plane efficiency of in-memory object caching systems and undertake a study on the widely deployed Memcached for fast message transfer and cost-effective storage extension based on high-performance I/O. First, the communication protocol is re-designed on RDMA semantics, and different strategies are applied according to the Memcached operation type and message payload size for optimal overall latency. Second, Memcached is altered to incorporate the NVMe SSDs to expand storage capacity. A circular log structure is adopted to manage the two-level hierarchy of DRAM and SSD. The SSD is directly accessed from the user-space to reduce software overhead. Finally, a JVM-enabled caching system named U2cache is presented. U2cache significantly improves the performance by bypassing both the OS kernel and the JVM runtime. The latency is further hidden through pipelining and overlapping of memory copy, RDMA transfer and SSD access. Benchmarking results indicate that U2cache achieves near-optimal performance of the underlying RDMA interconnect. Performance is further improved by 20% with careful optimization for transferring large messages. For accessing data located in SSD, the latency is reduced by up to 31% compared with the kernel-based I/O.