• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Gao Ruihao, Shi Shunchen, Li Xueqi, Tan Guangming. BeeZip2: High Performance Lossless Data Compression Domain-Specific Accelerator[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550017
Citation: Gao Ruihao, Shi Shunchen, Li Xueqi, Tan Guangming. BeeZip2: High Performance Lossless Data Compression Domain-Specific Accelerator[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550017

BeeZip2: High Performance Lossless Data Compression Domain-Specific Accelerator

Funds: This work was supported by the National Natural Science Foundation of China (T2125013, 62032023).
More Information
  • Author Bio:

    Gao Ruihao: born in 1998. PhD candidate. His main research interests include domain-specific architecture, data compression algorithm, and RTL simulation acceleration

    Shi Shunchen: born in 2000, PhD student. His main research interests include domain-specific hardware accelerators, processing in memory, and computer architecture

    Li Xueqi: born in 1991. PhD, associative professor. His main research interests include on domain-specific hardware accelerators, processing in memory, and system-architecture cross-layer optimization

    Tan Guangming: born in 1980. PhD, Professor. His research interests include parallel programming and algorithms design, domain-specific architecture, and bioinformatics

  • Received Date: January 09, 2025
  • Revised Date: April 08, 2025
  • Available Online: April 13, 2025
  • High-performance and intelligent computing applications require massive data. The transfer and storage of data pose challenges for computer systems. Data compression algorithms reduce storage and transmission costs, making them crucial for improving system efficiency. Domain-specific hardware design is an effective way to accelerate data compression algorithms. The emerging data compression utility, Zstandard, significantly enhances throughput and compression ratio. Zstandard is based on the LZ77 compression algorithm, but it has a larger sliding window that increases on-chip storage overhead. Also, it has complex data dependencies and control flow. These features limit the effect of hardware acceleration. To improve the throughput while achieving a similar compression ratio for data compression accelerator in the context of large sliding windows, this paper proposes a cross-layer optimization approach based on algorithm-architecture co-design to develop a novel data compression acceleration architecture, BeeZip2. First, this paper introduces the MetaHistory Match method into the design of the large sliding window parallel hash table, offering regular parallelism and addressing control flow data dependency. Then, this paper proposes the Shared Match PE architecture, distributing the large sliding window across multiple processing units to share on-chip memory and reduce overhead. In addition, the Lazy Match strategy and corresponding architecture help to fully leverage the resources for a higher compression ratio. Experimental results show BeeZip2 achieves 13.13 GB/s throughput while maintaining the software compression ratio. Compared to single-core and 36-core CPU software implementations, throughput increases by 29.2× and 3.35×, respectively. Compared to the baseline accelerator BeeZip, BeeZip2 achieves a 1.26× throughput improvement and a 2.02× throughput-per-area enhancement under the constraint of maintaining a higher compression ratio than its software counterpart.

  • [1]
    Li Shuang, Puig X, Paxton C, et al. Pre-trained language models for interactive decision-making[J]. Advances in Neural Information Processing Systems, 2022, 35: 31199−31212
    [2]
    Touvron H, Lavril T, Izacard G, et al. Llama: LLaMA: Open and efficient Foundation Language Models[J]. arXiv preprint, arXiv: 2302.13971, 2023. https://doi.org/10.48550/arXiv.2302.13971
    [3]
    Liu Haifeng, Zheng Long, Huang Yu, et al. Enabling efficient large recommendation model training with near CXL memory processing[C/OL]//Proc of 2024 ACM/IEEE 51st Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2024: 382−395[2024-09-21]. https://ieeexplore.ieee.org/document/10609725/. DOI: 10.1109/ISCA59077.2024.00036
    [4]
    Lee Y, Kim H, Rhu M. PreSto: An In-Storage Data Preprocessing System for Training Recommendation Models[C/OL]//2024 ACM/IEEE 51st Annual Int Symp on Computer Architecture (ISCA). Buenos Aires, Argentina: IEEE, 2024: 340−353[2024-09-06]. https://ieeexplore.ieee.org/document/10609715/. DOI: 10.1109/ISCA59077.2024.00033
    [5]
    Abramson J, Adler J, Dunger J, et al. Accurate structure prediction of biomolecular interactions with AlphaFold 3[J]. Nature, 2024, 630(8016): 493−500 doi: 10.1038/s41586-024-07487-w
    [6]
    Sayood K. Introduction to Data Compression[M]. Cambridge, MA 02139, United States: Morgan Kaufmann, 2017
    [7]
    Abali B, Blaner B, Reilly J, et al. Data compression accelerator on IBM POWER9 and z15 processors: Industrial product[C]//Proc of 2020 ACM/IEEE 47th Annual Int Symp on Computer Architecture (ISCA). Piscataway, NJ: IEEE, 2020: 1−14
    [8]
    Ziv J, Lempel A. A universal algorithm for sequential data compression[J/OL]. IEEE Transactions on Information Theory, 1977, 23(3): 337−343[2022-11-10]. http://ieeexplore.ieee.org/document/1055714/. DOI: 10.1109/TIT.1977.1055714
    [9]
    Collet Y. Zstandard: Real-time data compression algorithm[EB/OL]. [2024-12-25]. https://facebook.github.io/zstd/
    [10]
    Adler M. A Massively Spiffy Yet Delicately Unobtrusive Compression Library[EB/OL]. [2024-12-25]. https://www.zlib.net/
    [11]
    Terrell N. Zstd in the Linux Kernel[EB/OL]. [2024-12-25]. https://github.com/facebook/zstd/blob/dev/contrib/linux-kernel/README.md
    [12]
    Varma Santosh. Compression Methods in MongoDB: Snappy vs. Zstd[EB/OL]. [2024-12-25]. https://www.percona.com/blog/compression-methods-in-mongodb-snappy-vs-zstd/
    [13]
    Skibinski P. inikep/lzbench[CP/OL]. 2015 (2024-12-25)[2024-12-25]. https://github.com/inikep/lzbench
    [14]
    Hennessy J L, Patterson D A. A new golden age for computer architecture[J/OL]. Communications of the ACM, 2019, 62(2): 48−60[2024-12-25]. https://dl.acm.org/doi/ 10.1145/3282307. DOI: 10.1145/3282307
    [15]
    Dally W J, Turakhia Y, Han Song. Domain-specific hardware accelerators[J/OL]. Communications of the ACM, 2020, 63(7): 48−57[2024-12-25]. https://dl.acm.org/doi/ 10.1145/3361682. DOI: 10.1145/3361682
    [16]
    Deorowicz S. Silesia compression corpus[EB/OL]. 2003[2021-12-13]. http://sun.aei.polsl.pl/~sdeor/index.php?page=silesia
    [17]
    Matt Powell. Canterbury and Calgary compression corpora[EB/OL]. 1997(2001-01-08)[2021-12-13]. https://corpus.canterbury.ac.nz/descriptions/
    [18]
    nvCOMP: High-speed data compression using NVIDIA GPUs[EB/OL]. [2024-12-26]. https://developer.nvidia.com/nvcomp
    [19]
    Balasubramonian R, Kahng A B, Muralimanohar N, et al. CACTI 7: New tools for Interconnect exploration in innovative off-chip memories[J/OL]. ACM Transactions on Architecture and Code Optimization, 2017, 14(2): 1−25[2024-12-26]. https://dl.acm.org/doi/ 10.1145/3085572. DOI: 10.1145/3085572
    [20]
    Karandikar S, Mao H, Kim D, et al. FireSim: FPGA-accelerated cycle-exact scale-out system simulation in the public cloud[C/OL]//Proc of 2018 ACM/IEEE 45th Annual Int Symp on Computer Architecture (ISCA). Los Angeles, CA: IEEE, 2018: 29−42[2024-12-26]. https://ieeexplore.ieee.org/document/8416816/. DOI: 10.1109/ISCA.2018.00014
    [21]
    AMD Alveo U280: Product Brief[EB/OL]. 2020[2024-12-26]. https://www.xilinx.com/publications/product-briefs/alveo-u280-product-brief.pdf
    [22]
    Vitis Data Compression Library 2022.1[EB/OL]. 2022[2024-12-26]. https://xilinx.github.io/Vitis_Libraries/data_compression/2022.1/benchmark.html
    [23]
    Intel QAT: Performance, scale, and efficiency[EB/OL]. 2020[2024-12-26]. https://www.intel.com/content/www/us/en/architecture-and-technology/intel-quick-assist-technology-overview.html
    [24]
    Tian Jiannan, Di Sheng, Zhang Chengming, et al. waveSZ: A hardware-algorithm co-design of efficient lossy compression for scientific data[C/OL]//Proc of the 25th ACM SIGPLAN Symp on Principles and Practice of Parallel Programming. New York: ACM, 2020: 74−88[2021-03-03]. https://dl.acm.org/doi/ 10.1145/3332466.3374525. DOI: 10.1145/3332466.3374525
  • Related Articles

    [1]Hong Zhen, Feng Wanglei, Wen Zhenyu, Wu Di, Li Taotao, Wu Yiming, Wang Cong, Ji Shouling. Detecting Free-Riding Attack in Federated Learning Based on Gradient Backtracking[J]. Journal of Computer Research and Development, 2024, 61(9): 2185-2198. DOI: 10.7544/issn1000-1239.202330886
    [2]Shu Chang, Li Qingshan, Wang Lu, Wang Ziqi, Ji Yajiang. A Networked Software Optimization Mechanism Based on Gradient-Play[J]. Journal of Computer Research and Development, 2022, 59(9): 1902-1913. DOI: 10.7544/issn1000-1239.20220016
    [3]Dong Ye, Hou Wei, Chen Xiaojun, Zeng Shuai. Efficient and Secure Federated Learning Based on Secret Sharing and Gradients Selection[J]. Journal of Computer Research and Development, 2020, 57(10): 2241-2250. DOI: 10.7544/issn1000-1239.2020.20200463
    [4]Sun Jian, Li Zhanhuai, Li Qiang, Zhang Xiao, Zhao Xiaonan. SSD Power Modeling Method Based on the Gradient of Energy Consumption[J]. Journal of Computer Research and Development, 2019, 56(8): 1772-1782. DOI: 10.7544/issn1000-1239.2019.20170694
    [5]Li Shengdong, Lü Xueqiang. Static Restart Stochastic Gradient Descent Algorithm Based on Image Question Answering[J]. Journal of Computer Research and Development, 2019, 56(5): 1092-1100. DOI: 10.7544/issn1000-1239.2019.20180472
    [6]Chen Yao, Zhao Yonghua, Zhao Wei, Zhao Lian. GPU-Accelerated Incomplete Cholesky Factorization Preconditioned Conjugate Gradient Method[J]. Journal of Computer Research and Development, 2015, 52(4): 843-850. DOI: 10.7544/issn1000-1239.2015.20131919
    [7]Shen Yan, Zhu Yuquan, Liu Chunhua. Incremental FP_GROWTH Algorithm Based on Disk-resident 1-itemsets Counting[J]. Journal of Computer Research and Development, 2015, 52(3): 569-578. DOI: 10.7544/issn1000-1239.2015.20131436
    [8]Li Zhidan, He Hongjie, Yin Zhongke, Chen Fan. A Sparsity Image Inpainting Algorithm Combining Color with Gradient Information[J]. Journal of Computer Research and Development, 2014, 51(9): 2081-2093. DOI: 10.7544/issn1000-1239.2014.20130071
    [9]Mei Yuan, Sun Huaijiang, and Xia Deshen. A Gradient-Based Robust Method for Estimation of Fingerprint Orientation Field[J]. Journal of Computer Research and Development, 2007, 44(6): 1022-1031.
    [10]Zhao Qianjin, Hu Min, Tan Jieqing. Adaptive Many-Knot Splines Image Interpolation Based on Local Gradient Features[J]. Journal of Computer Research and Development, 2006, 43(9): 1537-1542.

Catalog

    Article views (27) PDF downloads (22) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return