Citation: | Yang Yongpeng, Jiang Dejun. A Method for Solving the wandering B+ tree Problem[J]. Journal of Computer Research and Development, 2023, 60(3): 539-554. DOI: 10.7544/issn1000-1239.202220555 |
In order to narrow the gap between the random write and sequential write performance of HDDs and SSDs, file systems and block storage systems usually use the log-structured technique to convert random write to sequential write. Therefore, modifications on log-structured storage system data and metadata are performed as out-of-place writes. In log-structured storage systems, B+ trees are often used to manage metadata. The tree node adopts the out-of-place update method, which will cause the tree node to be updated recursively, so it faces the wandering B+ tree problem. Currently, the main ideas of the existing methods are: The logical index and physical address of the tree node are separated, and a separate data structure and physical device space are used to store the mapping of the logical index and physical address of the tree node, thereby avoiding recursive updating of the tree node. However, the existing schemes not only introduce additional space overhead but also have the problem of non-sequential writing in the additional physical device space. We propose an IBT B+ tree, internal node based translation B+ tree, which embeds the logical index and physical addresses into the tree node. Based on the dirty linked list design, a non-recursive update algorithm for flushing the IBT B+ tree is proposed. The IBT B+ tree not only solves the problem of wandering B+ tree but also does not introduce additional data structure and space overhead. In this paper, the IBT B+ tree and the B+ tree designed by NAT, proposed in F2FS, are implemented respectively. On this basis, the Monty-Dev block storage system is designed and implemented to evaluate the two B+ trees. Experiments show that on HDD and SSD, the IBT B+ tree is better than the NAT B+ tree in both write amplification and flushing efficiency.
[1] |
Reinsel D, Gantz J, Rydning J. The digitization of the world – From edge to core [EB/OL]. 2018[2022-03-01].https://www.seagate.com/files/www-content/our-story/trends/files/idc-seagate-dataage-whitepaper.pdf
|
[2] |
Western Digital. Ultrastar DC HC550 [EB/OL]. 2020[2022-03-01].https://www.westerndigital.com/products/internal-drives/data-center-drives/ultrastar-dc-hc550-hdd#0F38353
|
[3] |
Western Digital. Ultrastar DC HC650 [EB/OL]. 2020[2022-03-01].https://documents.westerndigital.com/content/dam/doc-library/en_us/assets/public/western-digital/product/data-center-drives/ultrastar-dc-hc600-series/data-sheet-ultrastar-dc-hc650.pdf
|
[4] |
Nimbus Data. ExaDrive DC series [EB/OL]. 2020[2022-03-01].https://nimbusdata.com/docs/ExaDrive-DC-Datasheet.pdf
|
[5] |
Rosenblum M, Ousterhout J K. The design and implementation of a log-structured file system[J]. ACM Transactions on Computer Systems, 1992, 10(1): 26−52 doi: 10.1145/146941.146943
|
[6] |
Konishi R, Amagai Y, Sato K, et al. The Linux implementation of a log-structured file system[J]. ACM SIGOPS Operating Systems Review, 2006, 40(3): 102−107 doi: 10.1145/1151374.1151375
|
[7] |
Chen Feng, Koufaty D A, Zhang Xiaodong. Understanding intrinsic characteristics and system implications of flash memory based solid state drives[J]. ACM SIGMETRICS Performance Evaluation Review, 2009, 37(1): 181−92 doi: 10.1145/2492101.1555371
|
[8] |
Bouganim L, Jónsson B, Bonnet P. uFLIP: Understanding flash IO patterns [J]. arXiv preprint, arXiv: 09091780, 2009
|
[9] |
Min C, Kim K, Cho H, et al. SFS: Random write considered harmful in solid state drives [C/OL]//Proc of the 10th USENIX Conf on File and Storage Technologies (FAST’12). Berkeley, CA: USENIX Association, 2012: 139−154
|
[10] |
Woodhouse D. JFFS: The journalling flash file system [C/OL]//Proc of the Ottawa Linux Symp. 2001[2022-02-22].https://www.kernel.org/doc/mirror/ols2001/jffs2.pdf
|
[11] |
Lee C, Sim D, Hwang J, et al. F2FS: A new file system for flash storage [C]//Poc of the 13th USENIX Conf on File and Storage Technologies (FAST’15). Berkeley, CA: USENIX Association, 2015: 273−286
|
[12] |
Davenport C. The Pixel 3 uses Samsung’s super-fast F2FS file system [EB/OL]. 2018[2022-03-25].https://www.androidpolice.com/2018/10/10/pixel-3-uses-samsungs-super-fast-f2fs-file-system/
|
[13] |
Bjørling M, Aghayev A, Holmberg H, et al. ZNS: Avoiding the block interface tax for flash-based SSDs [C]//Proc of 2021 USENIX Annual Technical Conf (USENIX ATC’21). Berkeley, CA: USENIX Association, 2021: 689−703
|
[14] |
Han K, Gwak H, Shin D, et al. ZNS+: Advanced zoned namespace interface for supporting in-storage zone compaction [C]//Proc of the 15th USENIX Symp on Operating Systems Design and Implementation (OSDI’21). Berkeley, CA: USENIX Association, 2021: 147−162
|
[15] |
Na Wenwu, Meng Xiaoxuan, Si Chengxiang, et al. A novel network RAID architecture with out-of-band virtualization and redundant management [C] //Proc of the 14th Int Conf on Parallel and Distributed Systems (ICPADS 2008). Piscataway, NJ: IEEE, 2008: 105−112
|
[16] |
柯剑,朱旭东,那文武,等. 动态地址映射虚拟存储系统[J]. 计算机工程,2009,35(16):17−19,22 doi: 10.3969/j.issn.1000-3428.2009.16.006
Ke Jian, Zhu Xudong, Na Wenwu, et al. Dynamic address mapping virtualization storage system[J]. Computer Engineering, 2009, 35(16): 17−19,22 (in Chinese) doi: 10.3969/j.issn.1000-3428.2009.16.006
|
[17] |
那文武,孟晓烜,柯剑,等. BW-VSDS:大容量、可扩展、高性能和高可靠性的网络虚拟存储系统[J]. 计算机研究与发展,2009,46(s2):88−95
Na Wenwu, Meng Xiaoxuan, Ke Jian, et al. BW-VSDS: A network virtual storage system with large capacity, graceful scalability, high performance and high availability[J]. Journal of Computer Research and Development, 2009, 46(s2): 88−95 (in Chinese)
|
[18] |
Bityutskiy A B. JFFS3 design issues [J/OL]. Memory Technology Device (MTD) Subsystem for Linux, 2005[2022-01-06]. http://linux-mtd.infradead.org/tech/JFFS3design.pdf
|
[19] |
Linux Foundation. Linux kernel[EB/OL]. [2021-11-01].https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
|
[20] |
Rodeh O. B-trees, shadowing, and clones[J]. ACM Transactions on Storage, 2008, 3(4): 1−27
|
[21] |
Rodeh O. B-trees, shadowing, and range-operations[R/OL]. IBM Research, 2006[2022-03-15].https://dominoweb.draco.res.ibm.com/reports/h-0248.pdf
|
[22] |
Bayer R, Schkolnick M. Concurrency of operations on B-trees[J]. Acta Informatica, 1977, 9(1): 1−21
|
[23] |
Kim J. f2fs-tools[EB/OL]. [2021-10-15].https://github.com/jaegeuk/f2fs-tools
|
[24] |
杨勇鹏. SSD 缓存系统的内元数据结构研究与实现[D]. 北京: 中国科学院大学, 2018
Yang Yongpeng. Research and implementation of memory metadata structure of SSD cache system[D]. Beijing: University of Chinese Academy of Sciences, 2018(in Chinese)
|
[25] |
Axboe J. Flexible I/O tester [EB/OL]. [2022-03-16].https://github.com/axboe/fio
|
[26] |
github. Filebench [EB/OL]. [2022-03-16].https://github.com/filebench/filebench
|