高级检索

    Zstandard中LZ77压缩算法的高效匹配策略设计与实现

    Design and Implementation of Efficient Matching Strategies for LZ77 Compression in Zstandard

    • 摘要: 在信息时代背景下,数据规模的急剧增长与压缩应用场景的多样化,对压缩策略的灵活性和效率提出了更高要求。LZ77算法是典型的无损压缩方法,广泛应用于ZSTD(Zstandard)等主流压缩工具中。然而,更高的压缩率指标要求应用更大的历史窗口与更复杂的压缩策略,导致在实现ZSTD中的LZ77算法存在缓存频繁未命中与延迟匹配效率低下的问题。为此,提出2项优化策略:其一,多级区域搜索策略(multi-level region search strategy,MLRS),通过引入匹配区域分级与访问阈值控制机制灵活调整搜索深度,限制匹配过程中的数据访问范围,缓解缓存压力;其二,基于扩展搜索的延迟匹配优化策略(extended search-based lazy matching strategy,ESLM),通过复用搜索路径并采用近似替代技术,降低冗余计算的同时提升匹配效率。上述优化策略基于ZSTD level 12 配置在鲲鹏920服务器平台上实现并完成验证。实验结果表明:MLRS在多种数据集上能够显著降低压缩过程中的末级缓存未命中率,将压缩率保持在94.65%~99.58%区间的同时,使压缩吞吐量提升至原方案的118.34%~149.50%;ESLM可实现13.49%~17.46%的性能提升,且在多数数据集上进一步提高压缩比;当两者联合应用时,压缩速度可提升至原方案的134.53%~171.17%,同时维持94.18%~99.80%的压缩率。

       

      Abstract: In the context of the information age, the explosive growth of data volume and the diversification of compression application scenarios raise higher demands for the flexibility and efficiency of compression strategies. The LZ77 algorithm, a classical lossless compression method, is widely used in mainstream compression tools such as Zstandard (ZSTD). However, achieving higher compression ratios often requires larger history windows and more complex compression strategies, which in turn lead to frequent cache misses and reduced efficiency of lazy matching in ZSTD’s LZ77 implementation. To address these issues, two optimization strategies are proposed. First, the Multi-Level Region Search Strategy (MLRS) introduces a hierarchical matching region mechanism along with access threshold control, enabling adaptive adjustment of search depth and restricting memory access range during the matching process to alleviate cache pressure. Second, the Extended Search-based Lazy Matching Strategy (ESLM) reuses search paths and incorporates approximate substitution techniques to reduce redundant computation while improving matching efficiency. These strategies are implemented and evaluated under the ZSTD level 12 configuration on the Kunpeng 920 server platform. Experimental results demonstrate that MLRS significantly reduces last-level cache miss rates across various datasets and improves compression throughput to 118.34%-149.50% of the baseline while maintaining a compression ratio within 94.65%-99.58%. ESLM achieves a 13.49%-17.46% performance improvement and further enhances compression ratio on most datasets. When both strategies are applied jointly, the compression speed increases to 134.53%-171.17% of the original scheme, while sustaining a compression ratio in the range of 94.18%-99.80%.

       

    /

    返回文章
    返回