Abstract:
In the context of the information age, the explosive growth of data volume and the diversification of compression application scenarios raise higher demands for the flexibility and efficiency of compression strategies. The LZ77 algorithm, a classical lossless compression method, is widely used in mainstream compression tools such as Zstandard (ZSTD). However, achieving higher compression ratios often requires larger history windows and more complex compression strategies, which in turn lead to frequent cache misses and reduced efficiency of lazy matching in ZSTD’s LZ77 implementation. To address these issues, two optimization strategies are proposed. First, the Multi-Level Region Search Strategy (MLRS) introduces a hierarchical matching region mechanism along with access threshold control, enabling adaptive adjustment of search depth and restricting memory access range during the matching process to alleviate cache pressure. Second, the Extended Search-based Lazy Matching Strategy (ESLM) reuses search paths and incorporates approximate substitution techniques to reduce redundant computation while improving matching efficiency. These strategies are implemented and evaluated under the ZSTD level 12 configuration on the Kunpeng 920 server platform. Experimental results demonstrate that MLRS significantly reduces last-level cache miss rates across various datasets and improves compression throughput to 118.34%-149.50% of the baseline while maintaining a compression ratio within 94.65%-99.58%. ESLM achieves a 13.49%-17.46% performance improvement and further enhances compression ratio on most datasets. When both strategies are applied jointly, the compression speed increases to 134.53%-171.17% of the original scheme, while sustaining a compression ratio in the range of 94.18%-99.80%.