• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向键值存储的日志结构合并树优化技术

吴尚宇, 谢婧雯, 王毅

吴尚宇, 谢婧雯, 王毅. 面向键值存储的日志结构合并树优化技术[J]. 计算机研究与发展, 2020, 57(11): 2432-2441. DOI: 10.7544/issn1000-1239.2020.20190551
引用本文: 吴尚宇, 谢婧雯, 王毅. 面向键值存储的日志结构合并树优化技术[J]. 计算机研究与发展, 2020, 57(11): 2432-2441. DOI: 10.7544/issn1000-1239.2020.20190551
Wu Shangyu, Xie Jingwen, Wang Yi. Optimization of LSM-Tree for Key-Value Stores[J]. Journal of Computer Research and Development, 2020, 57(11): 2432-2441. DOI: 10.7544/issn1000-1239.2020.20190551
Citation: Wu Shangyu, Xie Jingwen, Wang Yi. Optimization of LSM-Tree for Key-Value Stores[J]. Journal of Computer Research and Development, 2020, 57(11): 2432-2441. DOI: 10.7544/issn1000-1239.2020.20190551
吴尚宇, 谢婧雯, 王毅. 面向键值存储的日志结构合并树优化技术[J]. 计算机研究与发展, 2020, 57(11): 2432-2441. CSTR: 32373.14.issn1000-1239.2020.20190551
引用本文: 吴尚宇, 谢婧雯, 王毅. 面向键值存储的日志结构合并树优化技术[J]. 计算机研究与发展, 2020, 57(11): 2432-2441. CSTR: 32373.14.issn1000-1239.2020.20190551
Wu Shangyu, Xie Jingwen, Wang Yi. Optimization of LSM-Tree for Key-Value Stores[J]. Journal of Computer Research and Development, 2020, 57(11): 2432-2441. CSTR: 32373.14.issn1000-1239.2020.20190551
Citation: Wu Shangyu, Xie Jingwen, Wang Yi. Optimization of LSM-Tree for Key-Value Stores[J]. Journal of Computer Research and Development, 2020, 57(11): 2432-2441. CSTR: 32373.14.issn1000-1239.2020.20190551

面向键值存储的日志结构合并树优化技术

基金项目: 国家自然科学基金项目(61972259);广东省自然科学基金-杰出青年基金项目(2019B151502055);广东省自然科学基金项目(2017B030314073);深圳市基础研究项目(JCYJ20170817100300603)
详细信息
  • 中图分类号: TP391

Optimization of LSM-Tree for Key-Value Stores

Funds: This work was supported by the National Natural Science Foundation of China (61972259), the Guangdong Natural Science Foundation-Distinguished Young Scholars (2019B151502055), the Guangdong Natural Science Foundation (2017B030314073), and the Shenzhen Natural Science Foundation (JCYJ20170817100300603).
  • 摘要: 日志结构合并树(log-structured merge tree, LSM-Tree)是一种针对写优化的数据结构,广泛应用于当代主流键值存储系统之中,用于处理当今世界海量多样化的数据.LSM-Tree通过批量处理的方式将随机写请求转换为顺序写请求,以保持极高的写效率.但LSM-Tree仍存在2个不足:一是数据的流动方向是单向的且固定不变.存储在LSM-Tree底部的数据将被一直保留底部,直到它们成为旧数据被压缩操作删除.访问这些数据将使读放大问题变得更加严重.二是LSM-Tree中的数据分布并未考虑访问频率的影响,这将导致访问延迟不平衡的问题.访问高频的低层数据将产生更高的访问延迟.提出了一种基于访问频率分布的上浮式键值存储结构(floating key-value, FloatKV).FloatKV首先在内存中提出了一种新的数据存储结构(LRU and FIFO, LRFO),其次在外存中设计了一种基于访问频率分布的上浮式键值存储策略.FloatKV记录外存中数据的访问频率,并根据访问频率来调整数据的存储位置,以减少访问延迟.为了验证FloatKV的可行性以及性能,使用标准数据库性能测试工具YSCB(yahoo! cloud serving benchmark)来进行评估,并将FloatKV与当前主流的技术进行比较.实验结果表明,FloatKV能够显著地提高读效率,并有效地减少了读放大问题.
    Abstract: LSM-Tree (log-structured merge tree) is widely used in contemporary mainstream key-value storage systems to process the vast amount of data. LSM-Tree converts random write requests into sequential write requests by maintaining a batch of requests in memory to achieve high write efficiency. However, there are still two shortages in LSM-Tree: First, the flow direction of data is unidirectional and fixed. Data stored at the bottom of LSM-Tree will remain there until they are deleted by compaction operation. This will make read amplification problem more serious. Second, the data distribution in LSM-Tree does not reflect access frequency. The data with different access frequencies could be stored in the same physical location. The data with high access frequency in lower layer will cost higher access latency. This paper presents FloatKV (floating key-value), an access frequency aware key-value storage strategy. FloatKV first proposes a structure called LRFO (LRU and FIFO) to manage data stored in memory. Then FloatKV presents a new strategy for floating process in external storage to move data closer to memory. FloatKV records the access frequency of data stored in the external storage and adjusts the storage location based on its access frequency. To verify the feasibility and performance of FloatKV, we conduct a series of experiments using standard benchmarks from YCSB (yahoo! cloud serving benchmark) and compare FloatKV with representative key-value store techniques. The experimental results show that FloatKV can significantly improve the reading efficiency and effectively reduce the read amplification problem.
  • 期刊类型引用(10)

    1. 汤书苑,周一青,李锦涛,刘畅,石晶林. 基于特征校准的双注意力遮挡行人检测器. 西安电子科技大学学报. 2024(06): 25-39 . 百度学术
    2. 曹玉东,陈冬昊,曹睿,赵朗. 融合Mask R-CNN的在线多目标行人跟踪方法. 计算机工程与科学. 2023(07): 1216-1225 . 百度学术
    3. 曹健,陈怡梅,李海生,蔡强. 基于深度学习的道路小目标检测综述. 计算机工程. 2023(10): 1-12 . 百度学术
    4. 徐放,苗夺谦,张红云. 基于多粒度的Transformer目标检测算法. 计算机科学. 2023(11): 143-150 . 百度学术
    5. 陈立,张帆,郭威,黄赟. 面向遥感图像的多阶段特征融合目标检测方法. 电子学报. 2023(12): 3520-3528 . 百度学术
    6. 宋思雨,苗夺谦. 基于多粒度空间混乱的细粒度图像分类算法. 智能系统学报. 2022(01): 144-150 . 百度学术
    7. 潘云磊. 基于改进蚁群算法的行人运动特征跟踪提取方法. 河北北方学院学报(自然科学版). 2022(07): 1-6 . 百度学术
    8. 欧群雍,谭同德,袁红斌. 结合CNN和Bi-LSTM的多行人目标检测跟踪方法. 无线电工程. 2022(09): 1633-1641 . 百度学术
    9. 袁小鑫,胡军,黄永洪. 针对目标检测器的假阳性对抗样本. 计算机研究与发展. 2022(11): 2534-2548 . 本站查看
    10. 史进玲,张江维,程菊明. 多标记的不完备多粒度决策系统的最优粒度选择. 许昌学院学报. 2021(02): 103-108 . 百度学术

    其他类型引用(15)

计量
  • 文章访问数:  1103
  • HTML全文浏览量:  2
  • PDF下载量:  529
  • 被引次数: 25
出版历程
  • 发布日期:  2020-10-31

目录

    /

    返回文章
    返回