高级检索

    BeeZip2: 高性能无损数据压缩领域专用加速器

    BeeZip2: A Domain-Specific Accelerator for High Performance Lossless Data Compression

    • 摘要: 领域专用加速器设计有望进一步提升数据压缩算法的性能,以适应更大规模的数据处理. 新兴的Zstandard压缩软件基于LZ77压缩算法,具有性能优势,但其“控制流数据依赖”与“滑动窗口扩大”的特征限制了加速器的性能发挥. 新型数据压缩加速器BeeZip2实践“算法-架构”跨层优化方法,首先,将“元历史匹配”与并行哈希表设计融合,应对控制流数据依赖问题. 然后,BeeZip2采用“共享匹配处理单元”架构及组织方式,减少大滑动窗口的开销. 此外,BeeZip2还包含“简易惰性匹配”策略及架构设计,提高“元历史匹配”和“共享处理单元”的利用效率. 实验结果表明,BeeZip2在达到软件相同压缩比的同时,可实现最高13.13 GB/s的吞吐率,相较于单核和36核CPU软件吞吐率分别提升了29.2倍和3.35倍. 与基线加速器BeeZip相比,BeeZip2在压缩比高于软件的约束下,吞吐率提升1.26倍,单位面积吞吐率提升2.02倍.

       

      Abstract: High-performance and intelligent computing applications require massive data. The transfer and storage of data pose challenges for computer systems. Data compression algorithms reduce storage and transmission costs, making themselves crucial for improving system efficiency. Domain-specific hardware design is an effective way to accelerate data compression algorithms. The emerging data compression software, Zstandard, significantly enhances throughput and compression ratio. Zstandard is based on LZ77 compression algorithm, but it has a larger sliding window that increases on-chip storage overhead. Also, it has complex data dependencies and control flow. These features limit the effect of hardware acceleration. To improve the throughput while achieving a similar compression ratio for data compression accelerator in the context of large sliding windows, we propose a cross-layer optimization approach based on algorithm-architecture co-design to develop a novel data compression acceleration architecture, BeeZip2. First, we introduce the MetaHistory match method into the design of the large sliding window parallel Hash table, offering regular parallelism and addressing control flow data dependency. Then, we propose the shared match PE architecture, distributing the large sliding window across multiple processing units to share on-chip memory and reduce overhead. In addition, the Lazy match strategy and corresponding architecture help to fully leverage the resources for a higher compression ratio. Experimental results show BeeZip2 achieves 13.13 GB/s throughput while maintaining the software compression ratio. Compared with single-core and 36-core CPU software implementations, throughput increases by 29.2 times and 3.35 times, respectively. Compared with the baseline accelerator BeeZip, BeeZip2 achieves a 1.26 times throughput improvement and a 2.02 times throughput-per-area enhancement under the constraint of maintaining a higher compression ratio than its software counterpart.

       

    /

    返回文章
    返回