Abstract:
High-performance and intelligent computing applications require massive data. The transfer and storage of data pose challenges for computer systems. Data compression algorithms reduce storage and transmission costs, making them crucial for improving system efficiency. Domain-specific hardware design is an effective way to accelerate data compression algorithms. The emerging data compression utility, Zstandard, significantly enhances throughput and compression ratio. Zstandard is based on the LZ77 compression algorithm, but it has a larger sliding window that increases on-chip storage overhead. Also, it has complex data dependencies and control flow. These features limit the effect of hardware acceleration. To improve the throughput while achieving a similar compression ratio for data compression accelerator in the context of large sliding windows, this paper proposes a cross-layer optimization approach based on algorithm-architecture co-design to develop a novel data compression acceleration architecture, BeeZip2. First, this paper introduces the MetaHistory Match method into the design of the large sliding window parallel hash table, offering regular parallelism and addressing control flow data dependency. Then, this paper proposes the Shared Match PE architecture, distributing the large sliding window across multiple processing units to share on-chip memory and reduce overhead. In addition, the Lazy Match strategy and corresponding architecture help to fully leverage the resources for a higher compression ratio. Experimental results show BeeZip2 achieves 13.13 GB/s throughput while maintaining the software compression ratio. Compared to single-core and 36-core CPU software implementations, throughput increases by 29.2× and 3.35×, respectively. Compared to the baseline accelerator BeeZip, BeeZip2 achieves a 1.26× throughput improvement and a 2.02× throughput-per-area enhancement under the constraint of maintaining a higher compression ratio than its software counterpart.