ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (3): 649-659.doi: 10.7544/issn1000-1239.2020.20180799

• 系统结构 • 上一篇    下一篇

基于用户级融合I/O的Key-Value存储系统优化技术研究

安仲奇1,张云尧1,2,邢晶1,霍志刚1,2   

  1. 1(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190);2(中国科学院大学计算机与控制工程学院 北京 100049) (anzhongqi@ncic.ac.cn)
  • 出版日期: 2020-03-01
  • 基金资助: 
    国家重点研发计划项目(2018YFC0809300);国家自然科学基金青年科学基金项目(61502454)

Optimization of the Key-Value Storage System Based on Fused User-Level I/O

An Zhongqi1, Zhang Yunyao1,2, Xing Jing1, Huo Zhigang1,2   

  1. 1(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190);2(School of Computer and Control Engineering, University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2020-03-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2018YFC0809300) and the National Natural Science Foundation of China for Young Scientists (61502454).

摘要: 传统分布式键值存储系统大都基于操作系统提供的套接字与可移植操作系统接口构建,受限于接口语义及内核开销,难以发挥底层新型网络和存储硬件高吞吐与低延迟的性能优势.聚焦键值存储系统的数据通路,面向高速以太网与NVMe(non-volatile memory express)固态存储,于用户态整合网络栈与I/O栈,协同设计以优化吞吐性能与延迟稳定性.用户级融合I/O栈的控制平面由同一处理器核心于同一上下文中统一管理网卡与固态存储设备的硬件队列,消除了传统分离式设计所导致的多次进出内核态、多次上下文切换以及潜在的核间通信与数据迁移等的弊端,最大限度降低系统软件层面的管控开销.数据平面采用统一的内存池,借助用户级设备驱动,数据于上层键值系统与底层设备之间直接通过DMA传输,没有额外数据拷贝与操作系统干涉.针对大消息访问请求,通过将数据分片并交叠执行网络与存储DMA操作,进一步掩藏了访问延迟.实现了全用户态键值存储系统UKV,支持内存-外存2层存储以及广泛应用的Memcache接口.将UKV与由Twitter开源的Fatcache系统进行了测试对比.实验结果表明,涉及外存的SET请求的每秒查询吞吐量提高了14.97%~97.78%,GET操作的每秒查询吞吐量提高了14.60%~51.81%;涉及外存的SET操作的p95延迟降低了26.12%~40.90%,GET操作的p95延迟降低了15.10%~24.36%.

关键词: 键值存储系统, 旁路内核, 用户级融合I/O, 高速以太网, NVMe固态硬盘

Abstract: The traditional distributed key-value storage systems are commonly designed around the conventional Socket and POSIX I/O interfaces. Limited by the interface semantics and OS kernel overhead, it is difficult for such key-value systems to achieve high efficiency on modern high-performance network and storage hardware. In this paper, we propose a fused user-level I/O approach to improve the throughput performance and latency consistency for key-value systems based on high-speed Ethernet and NVMe SSDs. The control plane of the proposed I/O stack utilizes one single processor core and one single context to cooperatively manage the hardware queues of both the NIC and the SSD devices. The overheads of kernel mode entering, interrupts and context switches and inter-core communications are eliminated. The data plane is driven by a unified memory pool for fused I/O access, and the data is directly transferred between the key-value system and the device hardware without extra data copies. For requests with large-size payload, data is sliced and fed into different DMA stages and the latency is further hidden through pipelining and overlapping. We present UKV, an all-in-userland key-value system with support of a two-level DRAM-SSD storage hierarchy and the widely-used Memcache interface. The experimental results indicate that, compared with Fatcache, the QPS of SSD-involved SET requests is increased by 14.97%~97.78%, and the QPS of the GET operation is increased by 14.60%~51.81%. The p95 latency of SSD-involved SET requests is reduced by 26.12%~40.90%, and the p95 latency of GET operations is reduced by 15.10%~24.36%.

Key words: key-value storage system, kernel-bypass, user-space fused I/O, high-speed Ethernet, NVMe SSD

中图分类号: