一种wandering B+ tree问题解决方法

杨勇鹏; 蒋德钧

doi:10.7544/issn1000-1239.202220555

摘要: 为了应对磁盘和固态硬盘随机写和顺序写性能差异较大的问题，文件系统和块存储系统通常采用日志结构（log-structured）技术将随机写转换为顺序写. 因此，对于日志结构存储系统数据和元数据的修改都以异地写的方式执行. 在日志结构存储系统中，B+ tree常被用于管理元数据，这就会导致wandering B+ tree问题，即树结点异地更新会导致树结构递归更新. 目前，现有工作主要通过分离树结点的逻辑索引和物理地址，并使用额外的数据结构和物理设备空间存放树结点逻辑索引和物理地址的映射，从而避免递归更新树结构. 但现有方法既引入额外空间开销，又存在额外物理设备空间非顺序写的问题. 提出IBT B+ tree，将树结点逻辑索引和物理地址均存放在树结构中. 同时，基于IBT B+ tree结构引入dirty链表设计，并提出了非递归更新的IBT B+ tree下刷算法. IBT B+ tree既解决了wandering B+ tree问题，又不引入额外的数据结构和物理设备空间，消除了固定物理设备空间的非顺序写. 分别实现IBT B+ tree和基于F2FS中NAT设计的B+ tree，在此基础上设计实现Monty-Dev块存储系统以评价2棵B+ tree. 实验表明，在HDD和SSD介质上，IBT B+ tree在写放大和下刷效率方面均优于NAT B+ tree.

Abstract: In order to narrow the gap between the random write and sequential write performance of HDDs and SSDs, file systems and block storage systems usually use the log-structured technique to convert random write to sequential write. Therefore, modifications on log-structured storage system data and metadata are performed as out-of-place writes. In log-structured storage systems, B+ trees are often used to manage metadata. The tree node adopts the out-of-place update method, which will cause the tree node to be updated recursively, so it faces the wandering B+ tree problem. Currently, the main ideas of the existing methods are: The logical index and physical address of the tree node are separated, and a separate data structure and physical device space are used to store the mapping of the logical index and physical address of the tree node, thereby avoiding recursive updating of the tree node. However, the existing schemes not only introduce additional space overhead but also have the problem of non-sequential writing in the additional physical device space. We propose an IBT B+ tree, internal node based translation B+ tree, which embeds the logical index and physical addresses into the tree node. Based on the dirty linked list design, a non-recursive update algorithm for flushing the IBT B+ tree is proposed. The IBT B+ tree not only solves the problem of wandering B+ tree but also does not introduce additional data structure and space overhead. In this paper, the IBT B+ tree and the B+ tree designed by NAT, proposed in F2FS, are implemented respectively. On this basis, the Monty-Dev block storage system is designed and implemented to evaluate the two B+ trees. Experiments show that on HDD and SSD, the IBT B+ tree is better than the NAT B+ tree in both write amplification and flushing efficiency.

一种wandering B+ tree问题解决方法

A Method for Solving the wandering B+ tree Problem