ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (11): 2432-2441.doi: 10.7544/issn1000-1239.2020.20190551

• 系统结构 • 上一篇    下一篇

面向键值存储的日志结构合并树优化技术

吴尚宇,谢婧雯,王毅   

  1. (深圳大学计算机与软件学院 广东深圳 518060) (shangyuwu1006@gmail.com)
  • 出版日期: 2020-11-01
  • 基金资助: 
    国家自然科学基金项目(61972259);广东省自然科学基金-杰出青年基金项目(2019B151502055);广东省自然科学基金项目(2017B030314073);深圳市基础研究项目(JCYJ20170817100300603)

Optimization of LSM-Tree for Key-Value Stores

Wu Shangyu, Xie Jingwen, Wang Yi   

  1. (College of Computer Science and Software Engineering, Shenzhen University, Shenzhen, Guangdong 518060)
  • Online: 2020-11-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61972259), the Guangdong Natural Science Foundation-Distinguished Young Scholars (2019B151502055), the Guangdong Natural Science Foundation (2017B030314073), and the Shenzhen Natural Science Foundation (JCYJ20170817100300603).

摘要: 日志结构合并树(log-structured merge tree, LSM-Tree)是一种针对写优化的数据结构,广泛应用于当代主流键值存储系统之中,用于处理当今世界海量多样化的数据.LSM-Tree通过批量处理的方式将随机写请求转换为顺序写请求,以保持极高的写效率.但LSM-Tree仍存在2个不足:一是数据的流动方向是单向的且固定不变.存储在LSM-Tree底部的数据将被一直保留底部,直到它们成为旧数据被压缩操作删除.访问这些数据将使读放大问题变得更加严重.二是LSM-Tree中的数据分布并未考虑访问频率的影响,这将导致访问延迟不平衡的问题.访问高频的低层数据将产生更高的访问延迟.提出了一种基于访问频率分布的上浮式键值存储结构(floating key-value, FloatKV).FloatKV首先在内存中提出了一种新的数据存储结构(LRU and FIFO, LRFO),其次在外存中设计了一种基于访问频率分布的上浮式键值存储策略.FloatKV记录外存中数据的访问频率,并根据访问频率来调整数据的存储位置,以减少访问延迟.为了验证FloatKV的可行性以及性能,使用标准数据库性能测试工具YSCB(yahoo! cloud serving benchmark)来进行评估,并将FloatKV与当前主流的技术进行比较.实验结果表明,FloatKV能够显著地提高读效率,并有效地减少了读放大问题.

关键词: 计算机体系结构, 键值存储, 日志结构合并树, 访问频率, 数据上浮

Abstract: LSM-Tree (log-structured merge tree) is widely used in contemporary mainstream key-value storage systems to process the vast amount of data. LSM-Tree converts random write requests into sequential write requests by maintaining a batch of requests in memory to achieve high write efficiency. However, there are still two shortages in LSM-Tree: First, the flow direction of data is unidirectional and fixed. Data stored at the bottom of LSM-Tree will remain there until they are deleted by compaction operation. This will make read amplification problem more serious. Second, the data distribution in LSM-Tree does not reflect access frequency. The data with different access frequencies could be stored in the same physical location. The data with high access frequency in lower layer will cost higher access latency. This paper presents FloatKV (floating key-value), an access frequency aware key-value storage strategy. FloatKV first proposes a structure called LRFO (LRU and FIFO) to manage data stored in memory. Then FloatKV presents a new strategy for floating process in external storage to move data closer to memory. FloatKV records the access frequency of data stored in the external storage and adjusts the storage location based on its access frequency. To verify the feasibility and performance of FloatKV, we conduct a series of experiments using standard benchmarks from YCSB (yahoo! cloud serving benchmark) and compare FloatKV with representative key-value store techniques. The experimental results show that FloatKV can significantly improve the reading efficiency and effectively reduce the read amplification problem.

Key words: computer architecture, key-value storage, LSM-Tree(log-structured merge tree), access frequency, data floating

中图分类号: