• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Gao Ruihao, Shi Shunchen, Li Xueqi, Tan Guangming. BeeZip2: A Domain-Specific Accelerator for High Performance Lossless Data Compression[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550017
Citation: Gao Ruihao, Shi Shunchen, Li Xueqi, Tan Guangming. BeeZip2: A Domain-Specific Accelerator for High Performance Lossless Data Compression[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550017

BeeZip2: A Domain-Specific Accelerator for High Performance Lossless Data Compression

Funds: 

undefined

This work was supported by the National Natural Science Foundation of China (T2125013, 62032023).

More Information
  • Author Bio:

    Gao Ruihao: born in 1998. PhD candidate. His main research interests include domain-specific architecture, data compression algorithm, and RTL simulation acceleration

    Shi Shunchen: born in 2000. PhD candidate. His main research interests include domain-specific hardware accelerators, processing in memory, and computer architecture

    Li Xueqi: born in 1991. PhD, associative professor. His main research interests include domain-specific hardware accelerators, processing in memory, and system-architecture cross-layer optimization

    Tan Guangming: born in 1980. PhD, professor. His main research interests include parallel programming and algorithms design, domain-specific architecture, and bioinformatics

  • Received Date: January 09, 2025
  • Revised Date: April 08, 2025
  • Available Online: April 13, 2025
  • High-performance and intelligent computing applications require massive data. The transfer and storage of data pose challenges for computer systems. Data compression algorithms reduce storage and transmission costs, making themselves crucial for improving system efficiency. Domain-specific hardware design is an effective way to accelerate data compression algorithms. The emerging data compression software, Zstandard, significantly enhances throughput and compression ratio. Zstandard is based on LZ77 compression algorithm, but it has a larger sliding window that increases on-chip storage overhead. Also, it has complex data dependencies and control flow. These features limit the effect of hardware acceleration. To improve the throughput while achieving a similar compression ratio for data compression accelerator in the context of large sliding windows, we propose a cross-layer optimization approach based on algorithm-architecture co-design to develop a novel data compression acceleration architecture, BeeZip2. First, we introduce the MetaHistory match method into the design of the large sliding window parallel Hash table, offering regular parallelism and addressing control flow data dependency. Then, we propose the shared match PE architecture, distributing the large sliding window across multiple processing units to share on-chip memory and reduce overhead. In addition, the Lazy match strategy and corresponding architecture help to fully leverage the resources for a higher compression ratio. Experimental results show BeeZip2 achieves 13.13 GB/s throughput while maintaining the software compression ratio. Compared with single-core and 36-core CPU software implementations, throughput increases by 29.2 times and 3.35 times, respectively. Compared with the baseline accelerator BeeZip, BeeZip2 achieves a 1.26 times throughput improvement and a 2.02 times throughput-per-area enhancement under the constraint of maintaining a higher compression ratio than its software counterpart.

  • [1]
    Li Shuang, Puig X, Paxton C, et al. Pre-trained language models for interactive decision-making[J]. Advances in Neural Information Processing Systems, 2022, 35: 31199−31212
    [2]
    Touvron H, Lavril T, Izacard G, et al. LLaMA: Open and efficient foundation language models[J]. arXiv preprint arXiv: 2302.13971, 2023
    [3]
    Liu Haifeng, Zheng Long, Huang Yu, et al. Enabling efficient large recommendation model training with near CXL memory processing[C]//Proc of 2024 ACM/IEEE 51st Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2024: 382−395
    [4]
    Choi Y, Kim J, Rhu M. ElasticRec: A microservice-based model serving architecture enabling elastic resource scaling for recommendation models[C]//Proc of 2024 ACM/IEEE 51st Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2024: 410−423
    [5]
    Lee Y, Kim H, Rhu M. PreSto: An in-storage data preprocessing system for training recommendation models[C]//Proc of 2024 ACM/IEEE 51st Annual Int Symp on Computer Architecture (ISCA). Pissataway, NJ: IEEE, 2024: 340−353
    [6]
    Abramson J, Adler J, Dunger J, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3[J]. Nature, 2024, 630(8016): 493−500 doi: 10.1038/s41586-024-07487-w
    [7]
    Batatia I, Kovacs D P, Simm G, et al. MACE: Higher order equivariant message passing neural networks for fast and accurate force fields[J]. Advances in neural information processing systems, 2022, 35: 11423−11436
    [8]
    Deng Bowen, Zhong Peichen, Jun K J, et al. CHGNet as a pretrained universal neural network potential for charge-informed atomistic modelling[J]. Nature Machine Intelligence, 2023, 5(9): 1031−1041 doi: 10.1038/s42256-023-00716-3
    [9]
    Sayood K. Introduction to Data Compression[M]. Cambridge, MA: Morgan Kaufmann, 2017
    [10]
    Abali B, Blaner B, Reilly J, et al. Data compression accelerator on IBM POWER9 and z15 processors: Industrial product[C]//Proc of 2020 ACM/IEEE 47th Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2020: 1−14
    [11]
    Deutsch P. GZIP file format specification version 4.3: RFC1952[R/OL]. RFC Editor, 1996: RFC1952[2021-12-10]. https://www.rfc-editor.org/info/rfc1952. DOI: 10.17487/rfc1952
    [12]
    Deutsch P. DEFLATE compressed data format specification version 1.3: RFC1951[R/OL]. RFC Editor, 1996: RFC1951[2021-12-12]. https://www.rfc-editor.org/info/rfc1951. DOI: 10.17487/rfc1951
    [13]
    Collet Y. Zstandard compression and the ‘application/zstd’ media type[EB/OL]. RFC Editor: United States, 2021[2024-12-24]. https://datatracker.ietf.org/doc/html/rfc8878
    [14]
    Karandikar S, Udipi A N, Choi J, et al. CDPU: Co-designing compression and decompression processing units for hyperscale systems[C]//Proc of the 50th Annual Int Symp on Computer Architecture(ISCA). New York, NY, USA: Association for Computing Machinery, 2023: 1−17
    [15]
    Gao Ruihao, Li Xueqi, Li Yewen, et al. MetaZip: a high-throughput and efficient accelerator for DEFLATE[C]//Proc of the 59th ACM/IEEE Design Automation Conf. San Francisco California: ACM, 2022: 319−324
    [16]
    Gao Ruihao, Li Zhichun, Tan Guangming, et al. BeeZip: towards an organized and scalable architecture for data compression[C]//Proc of the 29th ACM Int Conf on Architectural Support for Programming Languages and Operating Systems, Volume 3. La Jolla CA USA: ACM, 2024: 133−148
    [17]
    Fowers J, Kim J Y, Burger D, et al. A Scalable High-Bandwidth Architecture for Lossless Compression on FPGAs[C]//Proc of the 2015 IEEE 23rd Annual Int Symp on Field-Programmable Custom Computing Machines. Vancouver, BC, Canada: IEEE, 2015: 52−59
    [18]
    Rajeev S. opencomputeproject/Project-Zipline[CP/OL]. 2024[2024-12-24]. https://github.com/opencomputeproject/Project-Zipline
    [19]
    Ziv J, Lempel A. A universal algorithm for sequential data compression[J]. IEEE Trans on Information Theory, 1977, 23(3): 337−343 doi: 10.1109/TIT.1977.1055714
    [20]
    Collet Y. Zstandard: Real-time data compression algorithm[EB/OL]. [2024-12-25]. https://facebook.github.io/zstd/
    [21]
    Adler M. A massively spiffy yet delicately unobtrusive compression library[EB/OL]. [2024-12-25]. https://www.zlib.net/
    [22]
    Chiosa M, Maschi F, Müller I, et al. Hardware acceleration of compression and encryption in SAP HANA[J]. Proc of the VLDB Endowment, 2022, 15(12): 3277−3291 doi: 10.14778/3554821.3554822
    [23]
    Chen J, Daverveldt M, Al-Ars Z. FPGA acceleration of zstd compression algorithm[C]//Proc of 2021 IEEE Int Parallel and Distributed Processing Symp Workshops (IPDPSW). 2021: 188−191
    [24]
    Qiao Weikang, Du Jieqiong, Fang Zhenman, et al. High-throughput lossless compression on tightly coupled CPU-FPGA platforms[C]//Proc of 2018 IEEE 26th Annual Int Symp on Field-Programmable Custom Computing Machines (FCCM). Boulder, CO, USA: IEEE, 2018: 37−44
    [25]
    Terrell N. Zstd in the Linux kernel[EB/OL]. [2024-12-25]. https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/README.md
    [26]
    Santosh V. Compression methods in MongoDB: Snappy vs. Zstd[EB/OL]. [2024-12-25]. https://www.percona.com/blog/compression-methods-in-mongodb-snappy-vs-zstd/
    [27]
    Liang Xin, Di Sheng, Tao Dingwen, et al. Error-controlled lossy compression optimized for high compression ratios of scientific datasets[C]//2018 IEEE Int Conf on Big Data (Big Data). Seattle, WA, USA: IEEE, 2018: 438−447
    [28]
    Zhao Kai, Di Sheng, Liang Xin, et al. Significantly improving lossy compression for HPC datasets with second-order prediction and parameter optimization[C]//Proc of the 29th Int Symp on High-Performance Parallel and Distributed Computing. Stockholm Sweden: ACM, 2020: 89−100
    [29]
    Liang Xin, Zhao Kai, Di Sheng, et al. SZ3: A modular framework for composing prediction-based error-bounded lossy compressors[J]. IEEE Trans on Big Data, 2023, 9(2): 485−498 doi: 10.1109/TBDATA.2022.3201176
    [30]
    Hennessy J L, Patterson D A. A new golden age for computer architecture[J]. Communications of the ACM, 2019, 62(2): 48−60 doi: 10.1145/3282307
    [31]
    Dally W J, Turakhia Y, Han Song. Domain-specific hardware accelerators[J]. Communications of the ACM, 2020, 63(7): 48−57 doi: 10.1145/3361682
    [32]
    Skibinski P. inikep/lzbench[CP/OL]. 2015 (2024-12-25)[2024-12-25]. https://github.com/inikep/lzbench
    [33]
    Deorowicz S. Silesia compression corpus[EB/OL]. 2003[2021-12-13]. http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [34]
    Will B, Qian D, Khade A, et al. Intel® QuickAssist technology zstandard plugin, an external sequence producer for zstandard[EB/OL]. (2023-08-16)[2024-12-26]. https://community.intel.com/t5/Blogs/Tech-Innovation/Artificial-Intelligence-AI/Intel-QuickAssist-Technology-Zstandard-Plugin-an-External/post/1509818
    [35]
    Powell M. Canterbury and Calgary compression corpora[EB/OL]. 1997[2021-12-13]. https://corpus.canterbury.ac.nz/descriptions/
    [36]
    nvCOMP: High-speed data compression using NVIDIA GPUs[EB/OL]. [2024-12-26]. https: //developer.nvidia.com/nvcomp

    nvCOMP:High-speed data compression using NVIDIA GPUs[EB/OL]. [2024-12-26]. https://developer.nvidia.com/nvcomp
    [37]
    Balasubramonian R, Kahng A B, Muralimanohar N, et al. CACTI 7.0: New tools for Interconnect exploration in Innovative Off-Chip Memories[J]. ACM Trans on Architecture and Code Optimization, 2017, 14(2): 1−25
    [38]
    Karandikar S, Mao H, Kim D, et al. FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud[C]//Proc of 2018 ACM/IEEE 45th Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2018: 29−42
    [39]
    AMD. AMD Alveo U280: Product brief[EB/OL]. 2020[2024-12-26]. https: //www.xilinx.com/publications/product-briefs/alveo-u280-product-brief.pdf

    AMD. AMD Alveo U280:Product brief[EB/OL]. 2020[2024-12-26]. https://www.xilinx.com/publications/product-briefs/alveo-u280-product-brief.pdf
    [40]
    AMD. Vitis data compression library 2022.1[EB/OL]. 2022[2024-12-26]. https: //xilinx.github.io/Vitis_Libraries/data_compression/2022.1/benchmark.html

    AMD. Vitis data compression library 2022.1[EB/OL]. 2022[2024-12-26]. https://xilinx.github.io/Vitis_Libraries/data_compression/2022.1/benchmark.html
    [41]
    Intel. Intel QAT: Performance, Scale, and Efficiency[EB/OL]. 2020[2024-12-26]. https: //www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html

    Intel. Intel QAT:Performance,Scale,and Efficiency[EB/OL]. 2020[2024-12-26]. https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html
    [42]
    Tian Jiannan, Di Sheng, Zhang Chengming, et al. waveSZ: a hardware-algorithm co-design of efficient lossy compression for scientific data[C]//Proc of the 25th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. San Diego California: ACM, 2020: 74−88
  • Related Articles

    [1]Hong Zhen, Feng Wanglei, Wen Zhenyu, Wu Di, Li Taotao, Wu Yiming, Wang Cong, Ji Shouling. Detecting Free-Riding Attack in Federated Learning Based on Gradient Backtracking[J]. Journal of Computer Research and Development, 2024, 61(9): 2185-2198. DOI: 10.7544/issn1000-1239.202330886
    [2]Shu Chang, Li Qingshan, Wang Lu, Wang Ziqi, Ji Yajiang. A Networked Software Optimization Mechanism Based on Gradient-Play[J]. Journal of Computer Research and Development, 2022, 59(9): 1902-1913. DOI: 10.7544/issn1000-1239.20220016
    [3]Dong Ye, Hou Wei, Chen Xiaojun, Zeng Shuai. Efficient and Secure Federated Learning Based on Secret Sharing and Gradients Selection[J]. Journal of Computer Research and Development, 2020, 57(10): 2241-2250. DOI: 10.7544/issn1000-1239.2020.20200463
    [4]Sun Jian, Li Zhanhuai, Li Qiang, Zhang Xiao, Zhao Xiaonan. SSD Power Modeling Method Based on the Gradient of Energy Consumption[J]. Journal of Computer Research and Development, 2019, 56(8): 1772-1782. DOI: 10.7544/issn1000-1239.2019.20170694
    [5]Li Shengdong, Lü Xueqiang. Static Restart Stochastic Gradient Descent Algorithm Based on Image Question Answering[J]. Journal of Computer Research and Development, 2019, 56(5): 1092-1100. DOI: 10.7544/issn1000-1239.2019.20180472
    [6]Chen Yao, Zhao Yonghua, Zhao Wei, Zhao Lian. GPU-Accelerated Incomplete Cholesky Factorization Preconditioned Conjugate Gradient Method[J]. Journal of Computer Research and Development, 2015, 52(4): 843-850. DOI: 10.7544/issn1000-1239.2015.20131919
    [7]Shen Yan, Zhu Yuquan, Liu Chunhua. Incremental FP_GROWTH Algorithm Based on Disk-resident 1-itemsets Counting[J]. Journal of Computer Research and Development, 2015, 52(3): 569-578. DOI: 10.7544/issn1000-1239.2015.20131436
    [8]Li Zhidan, He Hongjie, Yin Zhongke, Chen Fan. A Sparsity Image Inpainting Algorithm Combining Color with Gradient Information[J]. Journal of Computer Research and Development, 2014, 51(9): 2081-2093. DOI: 10.7544/issn1000-1239.2014.20130071
    [9]Mei Yuan, Sun Huaijiang, and Xia Deshen. A Gradient-Based Robust Method for Estimation of Fingerprint Orientation Field[J]. Journal of Computer Research and Development, 2007, 44(6): 1022-1031.
    [10]Zhao Qianjin, Hu Min, Tan Jieqing. Adaptive Many-Knot Splines Image Interpolation Based on Local Gradient Features[J]. Journal of Computer Research and Development, 2006, 43(9): 1537-1542.

Catalog

    Article views PDF downloads Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return