Citation: | Shi Hongbo, Fan Zhihua, Li Wenming, Zhang Zhiyuan, Mu Yudong, Ye Xiaochun, An Xuejun. NTT Butterfly Arithmetic Acceleration Based on Dataflow Architecture[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550160 |
Fully homomorphic encryption (FHE), which enables computation on encrypted data without decryption throughout the entire processing flow, offers a promising solution for privacy preservation in cloud computing and other distributed environments. However, the practical deployment of FHE remains significantly constrained by its high computational complexity, poor data locality, and limited parallelism. Among the core operations in FHE, the number theoretic transform (NTT) plays a pivotal role in determining overall system performance. This paper targets the butterfly computation pattern, which is central to the NTT algorithm, and proposes a high-efficiency NTT accelerator architecture based on a dataflow computing model. First, we design an RVFHE extension instruction set tailored for NTT butterfly operations, incorporating custom modular multiplication and modular addition/subtraction units to enhance the efficiency of modular arithmetic. Second, we introduce a novel NTT data reordering scheme, combined with a structured butterfly address generation strategy, to reduce the control complexity and access conflicts associated with cross-row and cross-column data exchanges. Finally, we develop a dataflow-driven NTT accelerator architecture that leverages data dependency-triggered execution to enable efficient on-chip scheduling and data reuse, thereby exploiting instruction-level parallelism to the fullest extent. Experimental results demonstrate that, compared to NVIDIA GPU, the proposed architecture achieves up to 8.96× speedup and 8.53× improvement in energy efficiency. Furthermore, compared to state-of-the-art dedicated NTT accelerators, our design delivers a 1.37× performance gain.
[1] |
Gentry C. A Fully Homomorphic Encryption Scheme[D]. CA, USA: STANFORD UNIVERSITY, 2009
|
[2] |
Feldmann A, Samardzic N, Krastev A, et al. An Architecture to Accelerate Computation on Encrypted Data[J]. IEEE Micro, 2022, 42(4): 59−68 doi: 10.1109/MM.2022.3170792
|
[3] |
Gentry C. Fully homomorphic encryption using ideal lattices[J]. Stoc, 2009: 169−178
|
[4] |
Smart N, Vercauteren F. Fully homomorphic encryption with relatively small key and ciphertext sizes[C]//Proc of Public Key Cryptography– PKC 2010. Lecture Notes in Computer Science, Berlin: Springer, 2010: 420−443
|
[5] |
Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) Fully Homomorphic Encryption without Bootstrapping[J]. ACM Transactions on Computation Theory-Special Issue on Innovations in Theoretical Computer Science 2012- Part II, 2014, 6(3): 1−36
|
[6] |
Bos J, Lauter K, Loftus J, et al. Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme[C]//Proc of Cryptography and Coding. IMACC 2013. Lecture Notes in Computer Science, Berlin: Springer, 2013: 45−64
|
[7] |
Brakerski Z. Fully homomorphic encryption without modulus switching from classical GapSVP[C]//Advances in Cryptology–CRYPTO 2012. Lecture Notes in Computer Science, Berlin: Springer, 2012: 868−886
|
[8] |
Gentry C, Halevi S. Implementing Gentry’s fully-homomorphic encryption scheme[C]//Advances in Cryptology–EUROCRYPT 2011. EUROCRYPT 2011. Lecture Notes in Computer Science, Berlin: Springer, 2011: 129−148
|
[9] |
Erlingsson L, Pihur V, Korolova A. Rappor: Randomized aggregatable privacy-preserving ordinal response[C]//Proc of ACM Conf on Computer and Communications Security (CCS). Scottsdale, Arizona, USA, 2014
|
[10] |
Gentry C. Computing arbitrary functions of encrypted data[J]. Commun. ACM, 2010, 53(3): 97−105 doi: 10.1145/1666420.1666444
|
[11] |
Gentry C, Halevi S, Smart N. Fully homomorphic encryption with polylog overhead[C]//Proc of Annual Int Conf on the Theory and Applications of Cryptographic Techniques. Berlin: Springer, 2012: 465–482
|
[12] |
Gentry C, Sahai A, and Waters B. Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based[C]//Advances in Cryptology–CRYPTO 2013. Berlin: Springer, 2013: 75−92
|
[13] |
Cheon J , Kim A , Kim M , et al. Homomorphic Encryption for Arithmetic of Approximate Numbers[C]//Advances in Cryptology–ASIACRYPT 2017. Lecture Notes in Computer Science. Berlin: Springer, 2017: 409−437
|
[14] |
Fan Shengyu, Wang Zhiwei, Xu Weizhi, et al. TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU[C]//Proc of 2023 IEEE Int Symp on High-Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2023: 922−934
|
[15] |
Akleylek S, Özgur D, and Zaliha Y. On the efficiency of polynomial multiplication for lattice-based cryptography on GPUs using CUDA[C]//Proc of Int Conf on Cryptography and Information Security in the Balkans. Berlin: Springer, 2015: 155−168
|
[16] |
Badawi A, Veeravalli B, Mun C, et al. Highperformance FV somewhat homomorphic encryption on GPUs: An implementation using CUDA[J]. IACR: Transactions on Cryptographic Hardware and Embedded Systems, 2018(2): 70−95
|
[17] |
Dai W and Sunar B. cuHE: A homomorphic encryption accelerator library[C]//Proc of Cryptography and Information Security in the Balkans. BalkanCryptSec 2015. Lecture Notes in Computer Science, Berlin: Springer, 2015: 169−186
|
[18] |
Riazi M, Laine K, Pelton B, et al. HEAX: An architecture for computing on encrypted data[C]//Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 1295−1309
|
[19] |
Roy S, Turan F, Jarvinen K, et al. FPGA-based high-performance parallel architecture for homomorphic computing on encrypted data[C]//Proc of 2019 IEEE Int Symp on High Performance Computer Architecture (HPCA), Washington, DC, USA, 2019: 387−398
|
[20] |
Pöppelmann T, Naehrig M, Putnam A, et al. Accelerating homomorphic evaluation on reconfigurable hardware[C]//Proc of Cryptographic Hardware and Embedded Systems -CHES 2015. Lecture Notes in Computer Science, Berlin: Springer, 2015: 143−163
|
[21] |
Feldmann A, Samardzic N, Krastev A. F1: A fast and programmable accelerator for fully homomorphic encryption[C]//Proc of the 54th Annual IEEE/ACM Int Symp on Microarchitecture, 2021MICRO, New York: ACM, 2021: 238−252
|
[22] |
Samardzic N, Feldmann A, Krastev A. Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data[C]//Proc of Int Sympon Computer Architecture. New York : ACM, 2022: 173−187
|
[23] |
Karabulut E, Aysu A. RANTT: A RISC-V Architecture Extension for the Number Theoretic Transform[C]//Proc of 2020 30th Int Conf on Field-Programmable Logic and Applications (FPL), New York : ACM, 2020: 26−32
|
[24] |
Paludo R, Sousa L. NTT Architecture for a Linux-Ready RISC-V Fully-Homomorphic Encryption Accelerator[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(7): 2669−2682 doi: 10.1109/TCSI.2022.3166550
|
[25] |
Zhaojun Lu, Weizong Yu, Peng Xu, et al. An NTT/INTT accelerator with ultra-high throughput and area efficiency for FHE[C]//Proc of the 61st ACM/IEEE Design Automation Conference (DAC’24). Association for Computing Machinery, New York: ACM, 2024: 1−6
|
[26] |
Jack B. Dennis. First version of a dataflow procedure language[C]//Proc of Programming Symp. Lecture Notes in Computer Science. Berlin: Springer, 1974, 19: 362−376
|
[27] |
Dijk M , Gentry C , Halevi S , et al. Fully Homomorphic Encryption over the Integers[C]//Proc of Int Conf on Theory & Applications of Cryptographic Techniques. Berlin: Springer, 2010: 24−43
|
[28] |
Gentry C, Halevi S. Implementing Gentry’s Fully-Homomorphic Encryption Scheme[C]//Advances in Cryptology–EUROCRYPT 2011. Lecture Notes in Computer Science, Berlin: Springer, 2011: 129−148
|
[29] |
Krendelev S, Tormasov A. Method for protecting data used in cloud computing with homomorphic encryption: US10116437B1 [P]. 2018-10-30
|
[30] |
Zhang Y, Dai W, Jiang X, et al. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption[J]. BMC Medical Informatics&Decision Making, 2015
|
[31] |
Lagendijk R, Erkin Z, Barni M, Encrypted signal processing for privacy protection: Conveying the utility of homomorphic encryption and multiparty computation[J]. Signal Processing Magazine IEEE, 2013, 30(1): 82−105
|
[32] |
Gentry C , Halevi S , Smart N P . Homomorphic Evaluation of the AES Circuit[C]//Advances in Cryptology – CRYPTO 2012. Lecture Notes in Computer Science. Berlin: Springer, 2012: 850−867
|
[33] |
Asanović K, Avižienis R, Bachrach J, et al. The Rocket Chip Generator[R]. Berkeley: University of California, 2016: 1−11
|
[34] |
Ye Xiaochun , Fan Dongrui , Sun Ninghui , et al. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture[C]//Proc of the Int Symp on Low Power Electronics and Design (ISLPED), New York: ACM, 2013: 273−278
|
[35] |
Fan Zhihua, Li Wenming, Tang Shengzhong, et al. Improving utilization of dataflow architectures through software and hardware co-design[C]//Proc of Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, Cham: Springer, 2023: 245−259
|
[1] | Li Song, Cao Wenqi, Hao Xiaohong, Zhang Liping, Hao Zhongxiao. Collective Spatial Keyword Query Based on Time-Distance Constrained and Cost Aware[J]. Journal of Computer Research and Development, 2025, 62(3): 808-819. DOI: 10.7544/issn1000-1239.202330815 |
[2] | Wang Kaifan, Xu Yinan, Yu Zihao, Tang Dan, Chen Guokai, Chen Xi, Gou Lingrui, Hu Xuan, Jin Yue, Li Qianruo, Li Xin, Lin Jiawei, Liu Tong, Liu Zhigang, Wang Huaqiang, Wang Huizhe, Zhang Chuanqi, Zhang Fawang, Zhang Linjuan, Zhang Zifei, Zhang Ziyue, Zhao Yangyang, Zhou Yaoyang, Zou Jiangrui, Cai Ye, Huan Dandan, Li Zusong, Zhao Jiye, He Wei, Sun Ninghui, Bao Yungang. XiangShan Open-Source High Performance RISC-V Processor Design and Implementation[J]. Journal of Computer Research and Development, 2023, 60(3): 476-493. DOI: 10.7544/issn1000-1239.202221036 |
[3] | Ren Hao, Liu Baisong, Sun Jinyang, Dong Qian, Qian Jiangbo. A Time and Relation-Aware Graph Collaborative Filtering for Cross-Domain Sequential Recommendation[J]. Journal of Computer Research and Development, 2023, 60(1): 112-124. DOI: 10.7544/issn1000-1239.202110545 |
[4] | Zhang Tong, Feng Jiaqi, Ma Yanying, Qu Siyuan, Ren Fengyuan. Survey on Traffic Scheduling in Time-Sensitive Networking[J]. Journal of Computer Research and Development, 2022, 59(4): 747-764. DOI: 10.7544/issn1000-1239.20210203 |
[5] | Cui Yuanning, Li Jing, Shen Li, Shen Yang, Qiao Lin, Bo Jue. Duration-HyTE: A Time-Aware Knowledge Representation Learning Method Based on Duration Modeling[J]. Journal of Computer Research and Development, 2020, 57(6): 1239-1251. DOI: 10.7544/issn1000-1239.2020.20190253 |
[6] | Zheng Xiao, Gao Han, Wang Xiujun, Qin Feng. Contact Duration Aware Cooperative Data Caching in Mobile Opportunistic Networks[J]. Journal of Computer Research and Development, 2018, 55(2): 338-345. DOI: 10.7544/issn1000-1239.2018.20160929 |
[7] | Wang Chong, Lü Yinrun, Chen Li, Wang Xiuli, Wang Yongji. Survey on Development of Solving Methods and State-of-the-Art Applications of Satisfiability Modulo Theories[J]. Journal of Computer Research and Development, 2017, 54(7): 1405-1425. DOI: 10.7544/issn1000-1239.2017.20160303 |
[8] | Chen Huangke, Zhu Jianghan, Zhu Xiaomin, Ma Manhao, Zhang Zhenshi. Resource-Delay-Aware Scheduling for Real-Time Tasks in Clouds[J]. Journal of Computer Research and Development, 2017, 54(2): 446-456. DOI: 10.7544/issn1000-1239.2017.20151123 |
[9] | Zhou Hang, Huang Zhiqiu, Zhu Yi, Xia Liang, Liu Linyuan. Real-Time Systems Contact Checking and Resolution Based on Time Petri Net[J]. Journal of Computer Research and Development, 2012, 49(2): 413-420. |
[10] | Zhou Hang, Huang Zhiqiu, Hu Jun, Zhu Yi. Real-Time System Resource Conflict Checking Based on Time Petri Nets[J]. Journal of Computer Research and Development, 2009, 46(9): 1578-1585. |