Abstract:
In order to narrow the gap between the random write and sequential write performance of HDDs and SSDs, file systems and block storage systems usually use the log-structured technique to convert random write to sequential write. Therefore, modifications on log-structured storage system data and metadata are performed as out-of-place writes. In log-structured storage systems, B+ trees are often used to manage metadata. The tree node adopts the out-of-place update method, which will cause the tree node to be updated recursively, so it faces the wandering B+ tree problem. Currently, the main ideas of the existing methods are: The logical index and physical address of the tree node are separated, and a separate data structure and physical device space are used to store the mapping of the logical index and physical address of the tree node, thereby avoiding recursive updating of the tree node. However, the existing schemes not only introduce additional space overhead but also have the problem of non-sequential writing in the additional physical device space. We propose an IBT B+ tree, internal node based translation B+ tree, which embeds the logical index and physical addresses into the tree node. Based on the dirty linked list design, a non-recursive update algorithm for flushing the IBT B+ tree is proposed. The IBT B+ tree not only solves the problem of wandering B+ tree but also does not introduce additional data structure and space overhead. In this paper, the IBT B+ tree and the B+ tree designed by NAT, proposed in F2FS, are implemented respectively. On this basis, the Monty-Dev block storage system is designed and implemented to evaluate the two B+ trees. Experiments show that on HDD and SSD, the IBT B+ tree is better than the NAT B+ tree in both write amplification and flushing efficiency.