Zstandard中LZ77压缩算法的高效匹配策略设计与实现

韦芳宇; 陈天韵; 周洋; 谢雨来; 王芳

doi:10.7544/issn1000-1239.202550480

Zstandard中LZ77压缩算法的高效匹配策略设计与实现

Design and Implementation of Efficient Matching Strategies for LZ77 Compression Algorithm in Zstandard

摘要

摘要: 在信息时代背景下，数据规模的急剧增长与压缩应用场景的多样化，对压缩策略的灵活性和效率提出了更高要求。LZ77算法是典型的无损压缩方法，广泛应用于Zstandard（ZSTD）等主流压缩工具中。然而，更高的压缩率指标要求应用更大的历史窗口与更复杂的压缩策略，导致在实现ZSTD中的LZ77算法存在缓存频繁未命中与延迟匹配效率低下的问题。为此，提出2项优化策略：其一，多级区域搜索策略（multi-level region search strategy，MLRS），通过引入匹配区域分级与访问阈值控制机制灵活调整搜索深度，限制匹配过程中的数据访问范围，缓解缓存压力；其二，基于扩展搜索的延迟匹配策略（extended search-based lazy matching strategy，ESLM），通过复用搜索路径并采用近似替代技术，降低冗余计算的同时提升匹配效率。上述优化策略基于ZSTD level 12 配置在鲲鹏920服务器平台上实现并完成验证。实验结果表明：MLRS在多种数据集上能够显著降低压缩过程中的末级缓存未命中率，将压缩率保持在94.65%~99.58%的同时，使压缩吞吐量提升至原方案的118.34%~149.50%；ESLM可将压缩吞吐量提升至原方案的113.49%~117.46%，且在多数数据集上进一步提高压缩比；当两者联合应用时，压缩速度可提升至原方案的134.53%~171.17%，同时维持94.18%~99.80%的压缩率。

Abstract: In the context of the information age, the explosive growth of data volume and the diversification of compression application scenarios raise higher demands for the flexibility and efficiency of compression strategies. The LZ77 algorithm, a classical lossless compression method, is widely used in mainstream compression tools such as Zstandard (ZSTD). However, achieving higher compression ratios often requires larger history windows and more complex compression strategies, which in turn lead to frequent cache misses and reduced efficiency of lazy matching in ZSTD’s LZ77 implementation. To address these issues, two optimization strategies are proposed. First, the multi-level region search strategy (MLRS) introduces a hierarchical matching region mechanism along with access threshold control, enabling adaptive adjustment of search depth and restricting memory access range during the matching process to alleviate cache pressure. Second, the extended search-based lazy matching strategy (ESLM) reuses search paths and incorporates approximate substitution techniques to reduce redundant computation while improving matching efficiency. These strategies are implemented and evaluated under ZSTD level 12 configuration on Kunpeng 920 server platform. Experimental results demonstrate that MLRS significantly reduces last-level cache miss rates across various datasets and improves compression throughput to 118.34%−149.50% of the baseline while maintaining a compression ratio within 94.65%−99.58%. ESLM improves compression throughput to 113.49%−117.46% of the original while further increasing the compression ratio on most datasets. When both strategies are applied jointly, the compression speed increases to 134.53%−171.17% of the original scheme, while sustaining a compression ratio in the range of 94.18%−99.80%.

HTML全文

参考文献(28)

施引文献

资源附件(0)