• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Shi Hongbo, Fan Zhihua, Li Wenming, Zhang Zhiyuan, Mu Yudong, Ye Xiaochun, An Xuejun. NTT Butterfly Arithmetic Acceleration Based on Dataflow Architecture[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550160
Citation: Shi Hongbo, Fan Zhihua, Li Wenming, Zhang Zhiyuan, Mu Yudong, Ye Xiaochun, An Xuejun. NTT Butterfly Arithmetic Acceleration Based on Dataflow Architecture[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550160

NTT Butterfly Arithmetic Acceleration Based on Dataflow Architecture

Funds: 

undefined

This work was supported by the National Key Research and Development Program of China (2023YFB4503500), the Beijing Nova Program (20220484054, 20230484420), the Beijing Natural Science Foundation (L234078), and the CAS Project for Youth Innovation Promotion Association.

More Information
  • Author Bio:

    Shi Hongbo: born in 2002. Bachelor. His main research interests include dataflow architecture and high-performance computing

    Fan Zhihua: born in 1996. PhD, assistant professor. His main research interests include dataflow architecture, programming model, and reconfigurable architecture

    Li Wenming: born in 1988. PhD, associate professor. His main research interests include high-throughput processor architecture, dataflow architecture, and software simulation

    Zhang Zhiyuan: born in 2000. PhD candidate. His main research interests include high-performance computing, dataflow, and reconfigurable architecture

    Mu Yudong: born in 2001. PhD candidate. His main research interests include dataflow architecture, dataflow graph mapping, and reconfigurable architecture

    Ye Xiaochun: born in 1983. PhD,professor. His main research interests include high-performance computer architecture and software simulation

    An Xuejun: born in 1966. PhD,professor. His main research interests include programming model and processor architecture

  • Received Date: February 28, 2025
  • Revised Date: April 07, 2025
  • Available Online: April 16, 2025
  • Fully homomorphic encryption (FHE), which enables computation on encrypted data without decryption throughout the entire processing flow, offers a promising solution for privacy preservation in cloud computing and other distributed environments. However, the practical deployment of FHE remains significantly constrained by its high computational complexity, poor data locality, and limited parallelism. Among the core operations in FHE, the number theoretic transform (NTT) plays a pivotal role in determining overall system performance. We target the butterfly computation pattern, which is central to the NTT algorithm, and propose a high-efficiency NTT accelerator architecture based on a dataflow computing model. First, we design an RVFHE extension instruction set tailored for NTT butterfly operations, incorporating custom modular multiplication and modular addition/subtraction units to enhance the efficiency of modular arithmetic. Second, we introduce a novel NTT data reordering scheme, combined with a structured butterfly address generation strategy, to reduce the control complexity and access conflicts associated with cross-row and cross-column data exchanges. Finally, we develop a dataflow-driven NTT accelerator architecture that leverages data dependency-triggered execution to enable efficient on-chip scheduling and data reuse, thereby exploiting instruction-level parallelism to the fullest extent. Experimental results demonstrate that, compared with NVIDIA GPU, the proposed architecture achieves up to 8.96 times speedup and 8.53 times improvement in energy efficiency. Furthermore, compared with state-of-the-art dedicated NTT accelerators, our design delivers a 1.37 times performance gain.

  • [1]
    Gentry C. A fully homomorphic encryption scheme[D]. Palo Alto, CA: Stanford University, 2009
    [2]
    Feldmann A, Samardzic N, Krastev A, et al. An architecture to accelerate computation on encrypted data[J]. IEEE Micro, 2022, 42(4): 59−68 doi: 10.1109/MM.2022.3170792
    [3]
    Gentry C. Fully homomorphic encryption using ideal lattices[C]//Proc of Symp on the Theory of Computing. New York: ACM, 2009: 169−178
    [4]
    Smart N, Vercauteren F. Fully homomorphic encryption with relatively small key and ciphertext sizes[C]//Proc of Public Key Cryptography–PKC 2010. Berlin: Springer, 2010: 420−443
    [5]
    Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) Fully homomorphic encryption without bootstrapping[J]. ACM Transactions on Computation Theory-Special Issue on Innovations in Theoretical Computer Science 2012- Part II, 2014, 6(3): 1−36
    [6]
    Bos J, Lauter K, Loftus J, et al. Improved security for a ring-based fully homomorphic encryption scheme[C]//Proc of Cryptography and Coding. IMACC 2013. Berlin: Springer, 2013: 45−64
    [7]
    Brakerski Z. Fully homomorphic encryption without modulus switching from classical GapSVP[C]//Advances in Cryptology–CRYPTO 2012. Berlin: Springer, 2012: 868−886
    [8]
    Gentry C, Halevi S. Implementing Gentry’s fully-homomorphic encryption scheme[C]//Advances in Cryptology–EUROCRYPT 2011. EUROCRYPT 2011. Berlin: Springer, 2011: 129−148
    [9]
    Erlingsson L, Pihur V, Korolova A. Rappor: Randomized aggregatable privacy-preserving ordinal response[C]//Proc of ACM Conf on Computer and Communications Security (CCS). New York: ACM, 2014: 1054-1067
    [10]
    Gentry C. Computing arbitrary functions of encrypted data[J]. Communications of ACM, 2010, 53(3): 97−105 doi: 10.1145/1666420.1666444
    [11]
    Gentry C, Halevi S, Smart N. Fully homomorphic encryption with polylog overhead[C]//Proc of Annual Int Conf on the Theory and Applications of Cryptographic Techniques. Berlin: Springer, 2012: 465–482
    [12]
    Gentry C, Sahai A, Waters B. Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based[C]//Advances in Cryptology–CRYPTO 2013. Berlin: Springer, 2013: 75−92
    [13]
    Cheon J, Kim A, Kim M, et al. Homomorphic encryption for arithmetic of approximate numbers[C]//Advances in Cryptology–ASIACRYPT 2017. Berlin: Springer, 2017: 409−437
    [14]
    Fan Shengyu, Wang Zhiwei, Xu Weizhi, et al. TensorFHE: Achieving practical computation on encrypted data using GPGPU[C]//Proc of 2023 IEEE Int Symp on High-Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2023: 922−934
    [15]
    Akleylek S, Özgur D, Zaliha Y. On the efficiency of polynomial multiplication for lattice-based cryptography on GPUs using CUDA[C]//Proc of Int Conf on Cryptography and Information Security in the Balkans. Berlin: Springer, 2015: 155−168
    [16]
    Badawi A, Veeravalli B, Mun C, et al. High performance FV somewhat homomorphic encryption on GPUs: An implementation using CUDA[J]. Transactions on Cryptographic Hardware and Embedded Systems, 2018(2): 70−95
    [17]
    Dai W, Sunar B. cuHE: A homomorphic encryption accelerator library[C]//Proc of Cryptography and Information Security in the Balkans (BalkanCryptSec 2015). Berlin: Springer, 2015: 169−186
    [18]
    Riazi M, Laine K, Pelton B, et al. HEAX: An architecture for computing on encrypted data[C]//Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 1295−1309
    [19]
    Roy S, Turan F, Jarvinen K, et al. FPGA-based high-performance parallel architecture for homomorphic computing on encrypted data[C]//Proc of 2019 IEEE Int Symp on High Performance Computer Architecture (HPCA), Piscataway, NJ: IEEE, 2019: 387−398
    [20]
    Pöppelmann T, Naehrig M, Putnam A, et al. Accelerating homomorphic evaluation on reconfigurable hardware[C]//Proc of Cryptographic Hardware and Embedded Systems (CHES 2015). Berlin: Springer, 2015: 143−163
    [21]
    Feldmann A, Samardzic N, Krastev A. F1: A fast and programmable accelerator for fully homomorphic encryption[C]//Proc of the 54th Annual IEEE/ACM Int Symp on Microarchitecture (MICRO 2021). New York: ACM, 2021: 238−252
    [22]
    Samardzic N, Feldmann A, Krastev A. CraterLake: A hardware accelerator for efficient unbounded computation on encrypted data[C]//Proc of Int Symp on Computer Architecture. New York : ACM, 2022: 173−187
    [23]
    Karabulut E, Aysu A. RANTT: A RISC-V architecture extension for the number theoretic transform[C]//Proc of the 30th Int Conf on Field-Programmable Logic and Applications (FPL). New York : ACM, 2020: 26−32
    [24]
    Paludo R, Sousa L. NTT architecture for a Linux-ready RISC-V fully-homomorphic encryption accelerator[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(7): 2669−2682 doi: 10.1109/TCSI.2022.3166550
    [25]
    Lu Zhaojun, Yu Weizong, Xu Peng, et al. An NTT/INTT accelerator with ultra-high throughput and area efficiency for FHE[C]//Proc of the 61st ACM/IEEE Design Automation Conf (DAC’24). Association for Computing Machinery. New York: ACM, 2024: 1−6
    [26]
    Dennis J B. First version of a dataflow procedure language[C]//Proc of Programming Symp. Berlin: Springer, 1974, 19: 362−376
    [27]
    Dijk M, Gentry C, Halevi S, et al. Fully homomorphic encryption over the integers[C]//Proc of Int Conf on Theory & Applications of Cryptographic Techniques. Berlin: Springer, 2010: 24−43
    [28]
    Gentry C, Halevi S. Implementing Gentry’s fully-homomorphic encryption scheme[C]//Advances in Cryptology–EUROCRYPT 2011. Berlin: Springer, 2011: 129−148
    [29]
    Krendelev S, Tormasov A. Method for protecting data used in cloud computing with homomorphic encryption: US10116437B1 [P]. 2018-10-30
    [30]
    Zhang Y, Dai W, Jiang X, et al. FORESEE: Fully outsourced secure genome study based on homomorphic encryption[J/OL]. BMC Medical Informatics & Decision Making, 2015[2025-03-01]. http://doi.org/10.1186/1472-6947-15-s5-s5
    [31]
    Lagendijk R, Erkin Z, Barni M, Encrypted signal processing for privacy protection: Conveying the utility of homomorphic encryption and multiparty computation[J]. IEEE Signal Processing Magazine, 2013, 30(1): 82−105
    [32]
    Gentry C, Halevi S, Smart N P. Homomorphic evaluation of the AES circuit[C]//Advances in Cryptology–CRYPTO 2012. Berlin: Springer, 2012: 850−867
    [33]
    Asanović K, Avižienis R, Bachrach J, et al. The rocket chip generator[R]. Berkeley: University of California, 2016: 1−11
    [34]
    Ye Xiaochun, Fan Dongrui, Sun Ninghui, et al. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture[C]//Proc of the Int Symp on Low Power Electronics and Design (ISLPED). New York: ACM, 2013: 273−278
    [35]
    Fan Zhihua, Li Wenming, Tang Shengzhong, et al. Improving utilization of dataflow architectures through software and hardware co-design[C]//Proc of Parallel Processing (Euro-Par 2023). Berlin: Springer, 2023: 245−259
  • Related Articles

    [1]Wang Honglin, Yang Dan, Nie Tiezheng, Kou Yue. Attributed Heterogeneous Information Network Embedding with Self-Attention Mechanism for Product Recommendation[J]. Journal of Computer Research and Development, 2022, 59(7): 1509-1521. DOI: 10.7544/issn1000-1239.20210016
    [2]Ni Qingjian, Peng Wenqiang, Zhang Zhizheng, Zhai Yuqing. Spatial-Temporal Graph Neural Network for Traffic Flow Prediction Based on Information Enhanced Transmission[J]. Journal of Computer Research and Development, 2022, 59(2): 282-293. DOI: 10.7544/issn1000-1239.20210901
    [3]Cao Jiuxin, Gao Qingqing, Xia Rongqing, Liu Weijia, Zhu Xuelin, Liu Bo. Information Propagation Prediction and Specific Information Suppression in Social Networks[J]. Journal of Computer Research and Development, 2021, 58(7): 1490-1503. DOI: 10.7544/issn1000-1239.2021.20200809
    [4]Zhou Donghao, Han Wenbao, Wang Yongjun. A Fine-Grained Information Diffusion Model Based on Node Attributes and Content Features[J]. Journal of Computer Research and Development, 2015, 52(1): 156-166. DOI: 10.7544/issn1000-1239.2015.20130915
    [5]Ma Xiao, Wang Xuan, and Wang Xiaolong. The Information Model for a Class of Imperfect Information Game[J]. Journal of Computer Research and Development, 2010, 47(12).
    [6]Sun Qindong, Guan Xiaohong, Zhou Yadong. A Survey of Network Information Content Audit[J]. Journal of Computer Research and Development, 2009, 46(8): 1241-1250.
    [7]Tian Mei, Luo Siwei, Huang Yaping, and Zhao Jiali. Extracting Bottom-Up Attention Information Based on Local Complexity and Early Visual Features[J]. Journal of Computer Research and Development, 2008, 45(10): 1739-1746.
    [8]Ni Weiwei, Chen Geng, Lu Jieping, Wu Yingjie, Sun Zhihui. Local Entropy Based Weighted Subspace Outlier Mining Algorithm[J]. Journal of Computer Research and Development, 2008, 45(7): 1189-1194.
    [9]Liu Yunhui, Luo Siwei, Huang Hua, and Li Aijun. Information Geometric Analysis of Pruning Algorithm[J]. Journal of Computer Research and Development, 2006, 43(9): 1609-1614.
    [10]Dong Wenyu, Sun Donghong, Xu Ke, Li Xuedong. Modeling of Autonomous Network Information Service[J]. Journal of Computer Research and Development, 2006, 43(2): 224-230.

Catalog

    Article views (49) PDF downloads (20) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return