Abstract:
Emerging Zoned Namespace Solid-State Drive (ZNS SSD) addresses several critical issues inherent in traditional SSDs, including high write amplification, low storage density, and complex I/O paths. These advancements create new opportunities for the progress of storage technologies. The B+tree, as an efficient tree index structure, is widely used in various databases and file systems to support the efficient loading of large models, the construction of external knowledge bases, and the management of structured metadata, thereby significantly improving training efficiency and knowledge invocation performance. However, due to the hardware characteristics of ZNS SSDs being different from traditional block devices, directly deploying B+tree to ZNS SSDs not only results in higher write amplification rates but also causes cascading updates, seriously affecting the performance of the storage system. For the above problems, this paper proposes the DPZB+tree, a B+tree indexing structure optimized for ZNS SSDs and incorporating Persistent Memory (PM). First, the DPZB+tree adopts a hybrid DRAM-PM-ZNS SSD storage architecture, thereby effectively separating hot and cold data. Second, the DPZB+tree introduces a lightweight hot/cold node identification strategy, which in turn improves I/O efficiency. Third, to address the limited capacity of PM, this paper presents a dynamic node placement strategy that adaptively migrates data between PM and ZNS SSD. Finally, this paper combines hardware characteristics and spatiotemporal locality principles to design leaf node splitting and merging operations. DPZB+tree was implemented using a ZNS SSD simulator and Intel Optane Persistent Memory. Experimental results show that under various workloads, DPZB+tree outperforms LSM tree, SSDB+tree, DZB+tree, and Baseline in terms of performance and recovery time.