基于高性能I/O技术的Memcached优化研究

安仲奇; 杜昊; 李强; 霍志刚; 马捷

doi:10.7544/issn1000-1239.2018.20160890

基于高性能I/O技术的Memcached优化研究

安仲奇¹,
杜昊^1,2,
李强¹,
霍志刚¹,
马捷¹

¹(中国科学院计算技术研究所高性能计算机研究中心北京 100190)
²(中国科学院大学计算机与控制工程学院北京 100049) (anzhongqi@ncic.ac.cn)

基金项目: 国家重点研发计划项目(2016YFB0200204，2016YFB0200300)；国家自然科学基金青年科学基金项目(61402444，61502454)

详细信息

中图分类号: TP316
计量
- 文章访问数: 1212
- HTML全文浏览量: 4
- PDF下载量: 730
出版历程
- 发布日期: 2018-03-31

Memcached Optimization on High Performance I/O Technology

¹(High Performance Computer Research Center, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190) 2(School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049)

摘要

摘要: 内存对象缓存系统在通信方面受制于传统以太网的高延迟，在存储方面受限于服务器内可部署的内存规模，亟需融合新一代高性能I/O技术来提升性能、扩展容量.以广泛应用的Memcached为例，聚焦内存对象缓存系统的数据通路并基于高性能I/O对其进行通信加速与存储扩展.首先，基于日益流行的高性能远程直接内存访问(remote direct memory access, RDMA)语义重新设计通信协议，并针对不同的Memcached操作及消息大小设计不同的策略，降低了通信延迟.其次，利用高性能NVMe SSD来扩展Memcached存储，采用日志结构管理内存与外存2级存储，并通过用户级驱动实现对SSD的直接访问，降低了软件开销.最终，实现了支持JVM环境的高性能缓存系统U2cache.U2cache通过旁路操作系统内核和JVM运行时与内存拷贝、RDMA通信、SSD访问交叠流水的方法，显著降低了数据访问开销.实验结果表明，U2cache通信延迟接近RDMA底层硬件性能；对大消息而言，相较无优化版本，性能提高超过20%；访问SSD中的数据时，相比通过内核I/O软件栈的方式，访问延迟最高降低了31%.
- Memcached /
- 远程直接内存访问 /
- NVMe固态硬盘 /
- Java虚拟机 /
- 用户级I/O
Abstract: Existing in-memory object caching systems are bottlenecked by the latency overhead of traditional Ethernet and the limited DRAM amount within the servers. Modern high-performance I/O technologies such as RDMA and NVMe provide a promising solution to address such challenges. In this paper, we focus on the data plane efficiency of in-memory object caching systems and undertake a study on the widely deployed Memcached for fast message transfer and cost-effective storage extension based on high-performance I/O. First, the communication protocol is re-designed on RDMA semantics, and different strategies are applied according to the Memcached operation type and message payload size for optimal overall latency. Second, Memcached is altered to incorporate the NVMe SSDs to expand storage capacity. A circular log structure is adopted to manage the two-level hierarchy of DRAM and SSD. The SSD is directly accessed from the user-space to reduce software overhead. Finally, a JVM-enabled caching system named U2cache is presented. U2cache significantly improves the performance by bypassing both the OS kernel and the JVM runtime. The latency is further hidden through pipelining and overlapping of memory copy, RDMA transfer and SSD access. Benchmarking results indicate that U2cache achieves near-optimal performance of the underlying RDMA interconnect. Performance is further improved by 20% with careful optimization for transferring large messages. For accessing data located in SSD, the latency is reduced by up to 31% compared with the kernel-based I/O.
- Memcached /
- remote direct memory access (RDMA) /
- NVMe SSD /
- Java virtual machine (JVM) /
- user-level I/O