高级检索

    面向键值存储的日志结构合并树优化技术

    Optimization of LSM-Tree for Key-Value Stores

    • 摘要: 日志结构合并树(log-structured merge tree, LSM-Tree)是一种针对写优化的数据结构,广泛应用于当代主流键值存储系统之中,用于处理当今世界海量多样化的数据.LSM-Tree通过批量处理的方式将随机写请求转换为顺序写请求,以保持极高的写效率.但LSM-Tree仍存在2个不足:一是数据的流动方向是单向的且固定不变.存储在LSM-Tree底部的数据将被一直保留底部,直到它们成为旧数据被压缩操作删除.访问这些数据将使读放大问题变得更加严重.二是LSM-Tree中的数据分布并未考虑访问频率的影响,这将导致访问延迟不平衡的问题.访问高频的低层数据将产生更高的访问延迟.提出了一种基于访问频率分布的上浮式键值存储结构(floating key-value, FloatKV).FloatKV首先在内存中提出了一种新的数据存储结构(LRU and FIFO, LRFO),其次在外存中设计了一种基于访问频率分布的上浮式键值存储策略.FloatKV记录外存中数据的访问频率,并根据访问频率来调整数据的存储位置,以减少访问延迟.为了验证FloatKV的可行性以及性能,使用标准数据库性能测试工具YSCB(yahoo! cloud serving benchmark)来进行评估,并将FloatKV与当前主流的技术进行比较.实验结果表明,FloatKV能够显著地提高读效率,并有效地减少了读放大问题.

       

      Abstract: LSM-Tree (log-structured merge tree) is widely used in contemporary mainstream key-value storage systems to process the vast amount of data. LSM-Tree converts random write requests into sequential write requests by maintaining a batch of requests in memory to achieve high write efficiency. However, there are still two shortages in LSM-Tree: First, the flow direction of data is unidirectional and fixed. Data stored at the bottom of LSM-Tree will remain there until they are deleted by compaction operation. This will make read amplification problem more serious. Second, the data distribution in LSM-Tree does not reflect access frequency. The data with different access frequencies could be stored in the same physical location. The data with high access frequency in lower layer will cost higher access latency. This paper presents FloatKV (floating key-value), an access frequency aware key-value storage strategy. FloatKV first proposes a structure called LRFO (LRU and FIFO) to manage data stored in memory. Then FloatKV presents a new strategy for floating process in external storage to move data closer to memory. FloatKV records the access frequency of data stored in the external storage and adjusts the storage location based on its access frequency. To verify the feasibility and performance of FloatKV, we conduct a series of experiments using standard benchmarks from YCSB (yahoo! cloud serving benchmark) and compare FloatKV with representative key-value store techniques. The experimental results show that FloatKV can significantly improve the reading efficiency and effectively reduce the read amplification problem.

       

    /

    返回文章
    返回