高级检索

    BeeZip2: 高性能无损数据压缩领域专用加速器

    BeeZip2: High Performance Lossless Data Compression Domain-Specific Accelerator

    • 摘要: 领域专用加速架构设计有望进一步提升数据压缩算法的性能,以适应更大规模的数据处理. 新兴的Zstandard压缩软件基于LZ77压缩算法,具有性能优势,但其“控制流数据依赖”与“滑动窗口扩大”的特征限制了加速架构的性能发挥. 新型数据压缩加速架构BeeZip2实践“算法-架构”跨层优化方法,首先,将“元历史匹配”与并行哈希表设计融合,应对控制流数据依赖问题. 然后,BeeZip2采用“共享匹配处理单元”架构及组织方式,减少大滑动窗口的开销. 此外,BeeZip2还包含“简易惰性匹配”策略及架构设计,提高“元历史匹配”和“共享处理单元”的利用效率. 实验结果表明,BeeZip2在达到软件相同压缩比的同时,可实现最高13.13 GB/s吞吐率,相较于单核和36核CPU软件吞吐率分别提升了29.2倍和3.35倍. 与基线加速器BeeZip相比,BeeZip2在压缩比高于软件的约束下,吞吐率提升1.26倍,单位面积吞吐率提升2.02倍.

       

      Abstract: High-performance and intelligent computing applications require massive data. The transfer and storage of data pose challenges for computer systems. Data compression algorithms reduce storage and transmission costs, making them crucial for improving system efficiency. Domain-specific hardware design is an effective way to accelerate data compression algorithms. The emerging data compression utility, Zstandard, significantly enhances throughput and compression ratio. Zstandard is based on the LZ77 compression algorithm, but it has a larger sliding window that increases on-chip storage overhead. Also, it has complex data dependencies and control flow. These features limit the effect of hardware acceleration. To improve the throughput while achieving a similar compression ratio for data compression accelerator in the context of large sliding windows, this paper proposes a cross-layer optimization approach based on algorithm-architecture co-design to develop a novel data compression acceleration architecture, BeeZip2. First, this paper introduces the MetaHistory Match method into the design of the large sliding window parallel hash table, offering regular parallelism and addressing control flow data dependency. Then, this paper proposes the Shared Match PE architecture, distributing the large sliding window across multiple processing units to share on-chip memory and reduce overhead. In addition, the Lazy Match strategy and corresponding architecture help to fully leverage the resources for a higher compression ratio. Experimental results show BeeZip2 achieves 13.13 GB/s throughput while maintaining the software compression ratio. Compared to single-core and 36-core CPU software implementations, throughput increases by 29.2× and 3.35×, respectively. Compared to the baseline accelerator BeeZip, BeeZip2 achieves a 1.26× throughput improvement and a 2.02× throughput-per-area enhancement under the constraint of maintaining a higher compression ratio than its software counterpart.

       

    /

    返回文章
    返回