• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Shi Hongbo, Fan Zhihua, Li Wenming, Zhang Zhiyuan, Mu Yudong, Ye Xiaochun, An Xuejun. NTT Butterfly Arithmetic Acceleration Based on Dataflow Architecture[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550160
Citation: Shi Hongbo, Fan Zhihua, Li Wenming, Zhang Zhiyuan, Mu Yudong, Ye Xiaochun, An Xuejun. NTT Butterfly Arithmetic Acceleration Based on Dataflow Architecture[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550160

NTT Butterfly Arithmetic Acceleration Based on Dataflow Architecture

Funds: This work was supported in part by National Key R&D Program of China (2023YFB45003500), the Beijing Nova Program (20220484054, 20230484420), the Beijing Natural Science Foundation (L234078), and the CAS Project for Youth Innovation Promotion Association.
More Information
  • Author Bio:

    Shi Hongbo: born in 2002. Bachelor. His research interests include dataflow architecture and high-performance computing

    Fan Zhihua: born in 1996. PhD. Assistant professor. His main research interests include dataflow architecture, programming model, and reconfigurable architecture

    Li Wenming: born in 1988. PhD. Associate professor. His main research interests include high-throughput processor architecture, dataflow architecture, and software simulation

    Zhang Zhiyuan: born in 2000. PhD candidate. His main research interests include high-performance computing, dataflow, and reconfigurable architecture

    Mu Yudong: born in 2001. PhD candidate. His main research interests include dataflow architecture, dataflow graph mapping, and reconfigurable architecture

    Ye Xiaochun: born in 1983. PhD,professor. His main research interests include high-performance computer architecture and software simulation

    An Xuejun: born in 1966. PhD,professor. His main research interests include programming model and processor architecture

  • Received Date: February 28, 2025
  • Revised Date: April 07, 2025
  • Available Online: April 16, 2025
  • Fully homomorphic encryption (FHE), which enables computation on encrypted data without decryption throughout the entire processing flow, offers a promising solution for privacy preservation in cloud computing and other distributed environments. However, the practical deployment of FHE remains significantly constrained by its high computational complexity, poor data locality, and limited parallelism. Among the core operations in FHE, the number theoretic transform (NTT) plays a pivotal role in determining overall system performance. This paper targets the butterfly computation pattern, which is central to the NTT algorithm, and proposes a high-efficiency NTT accelerator architecture based on a dataflow computing model. First, we design an RVFHE extension instruction set tailored for NTT butterfly operations, incorporating custom modular multiplication and modular addition/subtraction units to enhance the efficiency of modular arithmetic. Second, we introduce a novel NTT data reordering scheme, combined with a structured butterfly address generation strategy, to reduce the control complexity and access conflicts associated with cross-row and cross-column data exchanges. Finally, we develop a dataflow-driven NTT accelerator architecture that leverages data dependency-triggered execution to enable efficient on-chip scheduling and data reuse, thereby exploiting instruction-level parallelism to the fullest extent. Experimental results demonstrate that, compared to NVIDIA GPU, the proposed architecture achieves up to 8.96× speedup and 8.53× improvement in energy efficiency. Furthermore, compared to state-of-the-art dedicated NTT accelerators, our design delivers a 1.37× performance gain.

  • [1]
    Gentry C. A Fully Homomorphic Encryption Scheme[D]. CA, USA: STANFORD UNIVERSITY, 2009
    [2]
    Feldmann A, Samardzic N, Krastev A, et al. An Architecture to Accelerate Computation on Encrypted Data[J]. IEEE Micro, 2022, 42(4): 59−68 doi: 10.1109/MM.2022.3170792
    [3]
    Gentry C. Fully homomorphic encryption using ideal lattices[J]. Stoc, 2009: 169−178
    [4]
    Smart N, Vercauteren F. Fully homomorphic encryption with relatively small key and ciphertext sizes[C]//Proc of Public Key Cryptography– PKC 2010. Lecture Notes in Computer Science, Berlin: Springer, 2010: 420−443
    [5]
    Brakerski Z, Gentry C, Vaikuntanathan V. (Leveled) Fully Homomorphic Encryption without Bootstrapping[J]. ACM Transactions on Computation Theory-Special Issue on Innovations in Theoretical Computer Science 2012- Part II, 2014, 6(3): 1−36
    [6]
    Bos J, Lauter K, Loftus J, et al. Improved Security for a Ring-Based Fully Homomorphic Encryption Scheme[C]//Proc of Cryptography and Coding. IMACC 2013. Lecture Notes in Computer Science, Berlin: Springer, 2013: 45−64
    [7]
    Brakerski Z. Fully homomorphic encryption without modulus switching from classical GapSVP[C]//Advances in Cryptology–CRYPTO 2012. Lecture Notes in Computer Science, Berlin: Springer, 2012: 868−886
    [8]
    Gentry C, Halevi S. Implementing Gentry’s fully-homomorphic encryption scheme[C]//Advances in Cryptology–EUROCRYPT 2011. EUROCRYPT 2011. Lecture Notes in Computer Science, Berlin: Springer, 2011: 129−148
    [9]
    Erlingsson L, Pihur V, Korolova A. Rappor: Randomized aggregatable privacy-preserving ordinal response[C]//Proc of ACM Conf on Computer and Communications Security (CCS). Scottsdale, Arizona, USA, 2014
    [10]
    Gentry C. Computing arbitrary functions of encrypted data[J]. Commun. ACM, 2010, 53(3): 97−105 doi: 10.1145/1666420.1666444
    [11]
    Gentry C, Halevi S, Smart N. Fully homomorphic encryption with polylog overhead[C]//Proc of Annual Int Conf on the Theory and Applications of Cryptographic Techniques. Berlin: Springer, 2012: 465–482
    [12]
    Gentry C, Sahai A, and Waters B. Homomorphic encryption from learning with errors: Conceptually-simpler, asymptotically-faster, attribute-based[C]//Advances in Cryptology–CRYPTO 2013. Berlin: Springer, 2013: 75−92
    [13]
    Cheon J , Kim A , Kim M , et al. Homomorphic Encryption for Arithmetic of Approximate Numbers[C]//Advances in Cryptology–ASIACRYPT 2017. Lecture Notes in Computer Science. Berlin: Springer, 2017: 409−437
    [14]
    Fan Shengyu, Wang Zhiwei, Xu Weizhi, et al. TensorFHE: Achieving Practical Computation on Encrypted Data Using GPGPU[C]//Proc of 2023 IEEE Int Symp on High-Performance Computer Architecture (HPCA). Piscataway, NJ: IEEE, 2023: 922−934
    [15]
    Akleylek S, Özgur D, and Zaliha Y. On the efficiency of polynomial multiplication for lattice-based cryptography on GPUs using CUDA[C]//Proc of Int Conf on Cryptography and Information Security in the Balkans. Berlin: Springer, 2015: 155−168
    [16]
    Badawi A, Veeravalli B, Mun C, et al. Highperformance FV somewhat homomorphic encryption on GPUs: An implementation using CUDA[J]. IACR: Transactions on Cryptographic Hardware and Embedded Systems, 2018(2): 70−95
    [17]
    Dai W and Sunar B. cuHE: A homomorphic encryption accelerator library[C]//Proc of Cryptography and Information Security in the Balkans. BalkanCryptSec 2015. Lecture Notes in Computer Science, Berlin: Springer, 2015: 169−186
    [18]
    Riazi M, Laine K, Pelton B, et al. HEAX: An architecture for computing on encrypted data[C]//Proc of the 25th Int Conf on Architectural Support for Programming Languages and Operating Systems. New York: ACM, 2020: 1295−1309
    [19]
    Roy S, Turan F, Jarvinen K, et al. FPGA-based high-performance parallel architecture for homomorphic computing on encrypted data[C]//Proc of 2019 IEEE Int Symp on High Performance Computer Architecture (HPCA), Washington, DC, USA, 2019: 387−398
    [20]
    Pöppelmann T, Naehrig M, Putnam A, et al. Accelerating homomorphic evaluation on reconfigurable hardware[C]//Proc of Cryptographic Hardware and Embedded Systems -CHES 2015. Lecture Notes in Computer Science, Berlin: Springer, 2015: 143−163
    [21]
    Feldmann A, Samardzic N, Krastev A. F1: A fast and programmable accelerator for fully homomorphic encryption[C]//Proc of the 54th Annual IEEE/ACM Int Symp on Microarchitecture, 2021MICRO, New York: ACM, 2021: 238−252
    [22]
    Samardzic N, Feldmann A, Krastev A. Craterlake: a hardware accelerator for efficient unbounded computation on encrypted data[C]//Proc of Int Sympon Computer Architecture. New York : ACM, 2022: 173−187
    [23]
    Karabulut E, Aysu A. RANTT: A RISC-V Architecture Extension for the Number Theoretic Transform[C]//Proc of 2020 30th Int Conf on Field-Programmable Logic and Applications (FPL), New York : ACM, 2020: 26−32
    [24]
    Paludo R, Sousa L. NTT Architecture for a Linux-Ready RISC-V Fully-Homomorphic Encryption Accelerator[J]. IEEE Transactions on Circuits and Systems I: Regular Papers, 2022, 69(7): 2669−2682 doi: 10.1109/TCSI.2022.3166550
    [25]
    Zhaojun Lu, Weizong Yu, Peng Xu, et al. An NTT/INTT accelerator with ultra-high throughput and area efficiency for FHE[C]//Proc of the 61st ACM/IEEE Design Automation Conference (DAC’24). Association for Computing Machinery, New York: ACM, 2024: 1−6
    [26]
    Jack B. Dennis. First version of a dataflow procedure language[C]//Proc of Programming Symp. Lecture Notes in Computer Science. Berlin: Springer, 1974, 19: 362−376
    [27]
    Dijk M , Gentry C , Halevi S , et al. Fully Homomorphic Encryption over the Integers[C]//Proc of Int Conf on Theory & Applications of Cryptographic Techniques. Berlin: Springer, 2010: 24−43
    [28]
    Gentry C, Halevi S. Implementing Gentry’s Fully-Homomorphic Encryption Scheme[C]//Advances in Cryptology–EUROCRYPT 2011. Lecture Notes in Computer Science, Berlin: Springer, 2011: 129−148
    [29]
    Krendelev S, Tormasov A. Method for protecting data used in cloud computing with homomorphic encryption: US10116437B1 [P]. 2018-10-30
    [30]
    Zhang Y, Dai W, Jiang X, et al. FORESEE: Fully Outsourced secuRe gEnome Study basEd on homomorphic Encryption[J]. BMC Medical Informatics&Decision Making, 2015
    [31]
    Lagendijk R, Erkin Z, Barni M, Encrypted signal processing for privacy protection: Conveying the utility of homomorphic encryption and multiparty computation[J]. Signal Processing Magazine IEEE, 2013, 30(1): 82−105
    [32]
    Gentry C , Halevi S , Smart N P . Homomorphic Evaluation of the AES Circuit[C]//Advances in Cryptology – CRYPTO 2012. Lecture Notes in Computer Science. Berlin: Springer, 2012: 850−867
    [33]
    Asanović K, Avižienis R, Bachrach J, et al. The Rocket Chip Generator[R]. Berkeley: University of California, 2016: 1−11
    [34]
    Ye Xiaochun , Fan Dongrui , Sun Ninghui , et al. SimICT: A fast and flexible framework for performance and power evaluation of large-scale architecture[C]//Proc of the Int Symp on Low Power Electronics and Design (ISLPED), New York: ACM, 2013: 273−278
    [35]
    Fan Zhihua, Li Wenming, Tang Shengzhong, et al. Improving utilization of dataflow architectures through software and hardware co-design[C]//Proc of Parallel Processing. Euro-Par 2023. Lecture Notes in Computer Science, Cham: Springer, 2023: 245−259
  • Related Articles

    [1]Li Song, Cao Wenqi, Hao Xiaohong, Zhang Liping, Hao Zhongxiao. Collective Spatial Keyword Query Based on Time-Distance Constrained and Cost Aware[J]. Journal of Computer Research and Development, 2025, 62(3): 808-819. DOI: 10.7544/issn1000-1239.202330815
    [2]Wang Kaifan, Xu Yinan, Yu Zihao, Tang Dan, Chen Guokai, Chen Xi, Gou Lingrui, Hu Xuan, Jin Yue, Li Qianruo, Li Xin, Lin Jiawei, Liu Tong, Liu Zhigang, Wang Huaqiang, Wang Huizhe, Zhang Chuanqi, Zhang Fawang, Zhang Linjuan, Zhang Zifei, Zhang Ziyue, Zhao Yangyang, Zhou Yaoyang, Zou Jiangrui, Cai Ye, Huan Dandan, Li Zusong, Zhao Jiye, He Wei, Sun Ninghui, Bao Yungang. XiangShan Open-Source High Performance RISC-V Processor Design and Implementation[J]. Journal of Computer Research and Development, 2023, 60(3): 476-493. DOI: 10.7544/issn1000-1239.202221036
    [3]Ren Hao, Liu Baisong, Sun Jinyang, Dong Qian, Qian Jiangbo. A Time and Relation-Aware Graph Collaborative Filtering for Cross-Domain Sequential Recommendation[J]. Journal of Computer Research and Development, 2023, 60(1): 112-124. DOI: 10.7544/issn1000-1239.202110545
    [4]Zhang Tong, Feng Jiaqi, Ma Yanying, Qu Siyuan, Ren Fengyuan. Survey on Traffic Scheduling in Time-Sensitive Networking[J]. Journal of Computer Research and Development, 2022, 59(4): 747-764. DOI: 10.7544/issn1000-1239.20210203
    [5]Cui Yuanning, Li Jing, Shen Li, Shen Yang, Qiao Lin, Bo Jue. Duration-HyTE: A Time-Aware Knowledge Representation Learning Method Based on Duration Modeling[J]. Journal of Computer Research and Development, 2020, 57(6): 1239-1251. DOI: 10.7544/issn1000-1239.2020.20190253
    [6]Zheng Xiao, Gao Han, Wang Xiujun, Qin Feng. Contact Duration Aware Cooperative Data Caching in Mobile Opportunistic Networks[J]. Journal of Computer Research and Development, 2018, 55(2): 338-345. DOI: 10.7544/issn1000-1239.2018.20160929
    [7]Wang Chong, Lü Yinrun, Chen Li, Wang Xiuli, Wang Yongji. Survey on Development of Solving Methods and State-of-the-Art Applications of Satisfiability Modulo Theories[J]. Journal of Computer Research and Development, 2017, 54(7): 1405-1425. DOI: 10.7544/issn1000-1239.2017.20160303
    [8]Chen Huangke, Zhu Jianghan, Zhu Xiaomin, Ma Manhao, Zhang Zhenshi. Resource-Delay-Aware Scheduling for Real-Time Tasks in Clouds[J]. Journal of Computer Research and Development, 2017, 54(2): 446-456. DOI: 10.7544/issn1000-1239.2017.20151123
    [9]Zhou Hang, Huang Zhiqiu, Zhu Yi, Xia Liang, Liu Linyuan. Real-Time Systems Contact Checking and Resolution Based on Time Petri Net[J]. Journal of Computer Research and Development, 2012, 49(2): 413-420.
    [10]Zhou Hang, Huang Zhiqiu, Hu Jun, Zhu Yi. Real-Time System Resource Conflict Checking Based on Time Petri Nets[J]. Journal of Computer Research and Development, 2009, 46(9): 1578-1585.

Catalog

    Article views (39) PDF downloads (16) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return