Abstract:
Nowadays, persistent key-value (KV) stores play a critical role in a variety of modern data-intensive applications, such as Web indexing, e-commerce, and cloud data storage systems, etc. KV stores that are based on log-structured merge tree (LSM-tree) have attracted growing attention because of their ability to eliminate random writes and maintain acceptable read performance. However, they also suffer from some performance issues. On one hand, they need to leverage write-ahead log (WAL) files to guarantee the atomicity and safety of write operations to enable recovery in case of a crash. This will result in severe write amplification and metadata overhead because of frequent WAL file update, leading to performance degradation. On the other hand, these KV stores usually use a conventional local filesystem to store KV data, which can harm the performance due to unnecessary operations in the filesystem. In this paper, we present RocksFS, an optimized filesystem for KV stores based on LSM-tree. We simplify the filesystem to remove unnecessary functions and attributes to reduce filesystem overhead and redesign the format and I/O path of WAL file to decrease metadata overhead. We compare RocksFS with conventional filesystems in the environment of RocksDB, a popular LSM-tree-based KV store. The experimental results demonstrate that RocksFS can observably improve the small key-value data write performance of RocksDB by 8x at most compared with traditional filesystems on both hard disk drive and solid state disk.