Abstract:
Emerging zoned namespace solid state drive (ZNS SSD) addresses several critical issues inherent in traditional SSDs, including high write amplification, low storage density, and complex I/O paths. These advancements create new opportunities for the progress of storage technologies. The B+tree, as an efficient tree index structure, is widely used in various databases and file systems to support the efficient loading of large models, the construction of external knowledge bases, and the management of structured metadata, thereby significantly improving training efficiency and knowledge invocation performance. However, due to the hardware characteristics of ZNS SSDs being different from traditional block devices, directly deploying B+tree to ZNS SSDs not only results in higher write amplification rates but also causes cascading updates, seriously affecting the performance of the storage system. For the above problems, we propose DPZB+tree, a B+tree indexing structure optimized for ZNS SSDs and incorporating persistent memory (PM). First, DPZB+tree adopts a hybrid DRAM-PM-ZNS SSD storage architecture, thereby effectively separating hot data and cold data. Second, DPZB+tree introduces a lightweight hot/cold node identification strategy, which in turn improves I/O efficiency. Third, to address the limited capacity of PM, we present a dynamic node placement strategy that adaptively migrates data between PM and ZNS SSD. Finally, we combine hardware characteristics and spatiotemporal locality principles to design leaf node splitting and merging operations. DPZB+tree is implemented using a ZNS SSD simulator and Intel optane PM. Experimental results show that under various workloads, DPZB+tree outperforms LSM tree, SSDB+tree, DZB+tree, and Baseline in terms of performance and recovery time.