• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zheng Jieyu, Song Zhenyu, Zhu Haoliang, Zhao Yunlei, Lin Jingqiang, Fan jin. Efficient Software Implementations of NTRU Lattice-Based Key Encapsulation Mechanisms[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440478
Citation: Zheng Jieyu, Song Zhenyu, Zhu Haoliang, Zhao Yunlei, Lin Jingqiang, Fan jin. Efficient Software Implementations of NTRU Lattice-Based Key Encapsulation Mechanisms[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440478

Efficient Software Implementations of NTRU Lattice-Based Key Encapsulation Mechanisms

Funds: This work was supported by the National Key Research and Development Program of China (2022YFB2701601), the General Project of State Key Laboratory of Cryptography(MMKFKT202227), the Shanghai Science and Technology Fund (21DZ2200500), the Shanghai Collaborative Innovation Fund (XTCX-KJ-2023-54), and the Special Fund for Key Technologies in Blockchain of Shanghai Scientific and Technological Committee (23511100300).
More Information
  • Author Bio:

    Zheng Jieyu: born in 2000. PhD candidate. Her research interests include lattice-based cryptography and cryptographic engineering

    Song Zhenyu: born in 1999. Master candidate. His research interests include post-quantum cryptography and cryptographic engineering

    Zhu Haoliang: born in 1999. Master candidate. His research interests include post-quantum cryptography and cryptographic engineering

    Zhao Yunlei: born in 1974. PhD, distinguished professor. His main research interests include post-quantum cryptography, cryptographic protocols, theory of computing

    Lin Jingqiang: born in 1978. PhD, Professor. His main research interests include cryptographic applications, system and network security

    Fan jin: born in 1987. PhD, senior engineer, deputy director of the Sixth Institute of Electronics. His research interests include information security technology research

  • Received Date: June 04, 2024
  • Revised Date: January 08, 2025
  • Accepted Date: January 25, 2025
  • Available Online: January 25, 2025
  • NTRU lattice is an important choice for building a practical post-quantum lattice-based key encapsulation mechanism. The software optimization engineering implementation of lattice cryptography is of great significance for the subsequent application deployment of post-quantum cryptography. CTRU is a lattice-based key encapsulation mechanism based on NTRU lattice proposed by Chinese scholars. At present, there only exists CTRU-768 scheme C and AVX2 implementation, and there is room for further optimization. In addition, the implementation of CTRU-768 cannot be directly extended to the CTRU-512 and CTRU-1024 schemes. This paper completes the first optimized reference C implementation of CTRU-512 and CTRU-1024 schemes and its variant CNTR-512 and CNTR-1024 with and the corresponding AVX2 parallel optimization implementation, and optimizes the existing CTRU-768 reference implementation and AVX2 implementation. It employs mixed radix number theoretic transformation (NTT) to accelerate polynomial multiplication and uses the Karatsuba algorithm to speed up the decomposition of low-degree polynomial ring multiplication. In addition, combined with the central Barrett reduction, this paper proposes index-based delay reduction in reverse NTT. For the time-consuming polynomial inversion under CTRU-1024 scheme, we employ the Bernstein fast inversion algorithm. Furthermore, this paper provides a more efficient AVX2 optimization implementation scheme, which uses the single instruction multiple data (SIMD) instruction set AVX2 proposed by Intel to accelerate the performance bottleneck in CTRU. This paper uses layer merging and coefficient permutation to reduce the load/store instructions. In addition, the Bernstein fast polynomial inversion algorithm is vectorized and optimized using AVX2. We also implement the time-consuming SHA-3 hash module in AVX2 assembly. Compared with the latest CTRU-768 scheme AVX2 implementation, the AVX2 optimized implementation in this paper improves by 8%-11%. For the CTRU scheme, compared with the reference implementation, the performance improvement of the AVX2 optimized implementation in this paper on three sets of parameters is significant. The key generation, key encapsulation, and key decapsulation improvements are 56%-91%, 74%-90%, and 70%-83% respectively.

  • [1]
    Rivest R L, Shamir A, Adleman L. A method for obtaining digital signatures and public-key cryptosystems[J]. Communications of the ACM, 1978, 21(2): 120−126 doi: 10.1145/359340.359342
    [2]
    Koblitz N. Elliptic curve cryptosystems[J]. Mathematics of Computation, 1987, 48(177): 203−209 doi: 10.1090/S0025-5718-1987-0866109-5
    [3]
    National Institute of Standards and Technology. Post-quantum cryptography call for proposals [EB/OL]. [2024-09-14]. https://csrc.nist.gov/Projects/post-quantum-cryptography/post-quantum-cryptography-standardization/Call-for-Proposals
    [4]
    Avanzi R, Bos J, Ducas L, et al. CRYSTALS-Kyber algorithm specifications and supporting documentation[J]. NIST PQC Round, 2019, 2(4): 1−43
    [5]
    Fouque P A, Kirchner P, Pornin T, et al. BAT: Small and fast KEM over NTRU lattices [J/OL]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2022: 240−265 [2024-09-14]. https://tches.iacr.org/index.php/TCHES/article/view/9487
    [6]
    Zhang Jiang, Feng Dengguo, Yan Di. NEV: Faster and smaller NTRU encryption using vector decoding [C/OL]//Proc of ASIACRYPT 2023. Berlin: Springer, 2023: 157−189 [2024-09-14]. https://link.springer.com/chapter/10.1007/978-981-99-8739-9_6
    [7]
    Jin Zhengzhong, Shen Shiyu, Zhao Yunlei. Compact and flexible KEM from ideal lattice[J]. IEEE Transactions on Information Theory, 2022, 68(6): 3829−3840 doi: 10.1109/TIT.2022.3148586
    [8]
    Liang Zhichuang, Fang Boyue, Zheng Jieyu, et al. Compact and efficient KEMs over NTRU lattices[J]. Computer Standards & Interfaces, 2024, 89: 103828
    [9]
    密码行业标准化技术委员会. 2023年度密码行业标准制修订任务(商用密码领域)[S]. 北京:密码行业标准化技术委员会,2023

    Cryptography Standardization Technical Committee. 2023 Annual Encryption Industry Standard Formulation and Revision Tasks (Commercial Encryption Field) [S]. Beijing: Cryptography Standardization Technical Committee, 2023 (in Chinese)
    [10]
    Intel Corporation. Intel instruction set architecture extensions [EB/OL]. [2024-09-14]. http://software.intel.com/en-us/isa-extensions
    [11]
    Lomont C. Introduction to intel advanced vector extensions [EB/OL]. [2024-09-14]. https://hpc.llnl.gov/sites/default/files/intelAVXintro.pdf
    [12]
    Reinders J R. Intel AVX−512 Instructions [EB/OL]. [2024-09-14]. https://www.intel.com/content/www/us/en/developer/articles/technical/intel-avx-512-instructions.html
    [13]
    Bakos J D. Embedded Systems: ARM Programming and Optimization [M]. Amsterdam: Elsevier, 2023
    [14]
    Dang V B, Mohajerani K, Gaj K. High-speed hardware architectures and FPGA benchmarking of CRYSTALS-Kyber, NTRU, and Saber[J]. IEEE Transactions on Computers, 2022, 72(2): 306−320
    [15]
    El Khatib R, Azarderakhsh R, Mozaffari-Kermani M. High-performance FPGA accelerator for SIKE[J]. IEEE Transactions on Computers, 2021, 71(6): 1237−1248
    [16]
    Ricci S, Malina L, Jedlicka P, et al. Implementing CRYSTALS-Dilithium signature scheme on FPGAs [C/OL]//Proc of the 16th Int Conf on Availability, Reliability and Security. New York: ACM, 2021 [2024-09-14]. https://dl.acm.org/doi/abs/10.1145/3465481.3465756
    [17]
    Shim K A, Lee S, Koo N. Efficient implementations of Rainbow and UOV using AVX2 [J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2022: 245−269 [2024-09-14]. https://tches.iacr.org/index.php/TCHES/article/view/9296
    [18]
    Chou T, Liou J H. A constant-time AVX2 implementation of a variant of ROLLO [J]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2022: 152−174 [2024-09-14]. https://tches.iacr.org/index.php/TCHES/article/view/9293
    [19]
    Drucker N, Gueron S. Speed up over the Rainbow [C] // Proc of the 18th Int Conf on Information Technology-New Generations. Berlin: Springer, 2021: 131−136
    [20]
    Abdulrahman A, Hwang V, Kannwischer M J, et al. Faster Kyber and Dilithium on the cortex-m4 [C] //Proc of the Int Conf on Applied Cryptography and Network Security. Berlin: Springer, 2022: 853−871
    [21]
    Botros L, Kannwischer M J, Schwabe P. Memory-efficient high-speed implementation of Kyber on Cortex-M4 [C] //Proc of the 11th Int Conf on Cryptology in Africa. Berlin: Springer, 2019: 209−228
    [22]
    Kannwischer M J, Rijneveld J, Schwabe P. Faster multiplication in on Cortex-M4 to speed up NIST PQC candidates [C] //Proc of the Int Conf on Applied Cryptography and Network Security. Berlin: Springer, 2019: 281−301
    [23]
    Chung C M M, Hwang V, Kannwischer M J, et al. NTT multiplication for NTT-unfriendly rings: New speed records for Saber and NTRU on Cortex-M4 and AVX2 [J/OL]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2021: 159−188 [2024-09-14]. https://tches.iacr.org/index.php/TCHES/article/view/8791
    [24]
    Hwang V. Pushing the limit of vectorized polynomial multiplications for NTRU prime [C] //Proc of the Australasian Conf on Information Security and Privacy. Berlin: Springer, 2024: 84−102
    [25]
    Alkim E,Cheng D Y L,Chung C M M,et al. Polynomial multiplication in NTRU prime:Comparison of optimization strategies on Cortex-M4 [J]. Cryptology ePrint Archive,2020
    [26]
    Dai Wei, Whyte W, Zhang Zhenfei. Optimizing polynomial convolution for NTRUEncrypt[J]. IEEE Transactions on Computers, 2018, 67(11): 1572−1583
    [27]
    Becker H,Mera J M B,Karmakar A,et al. Polynomial multiplication on embedded vector architectures [J]. Cryptology ePrint Archive,2021
    [28]
    Bernstein D J, Brumley B B, Chen M S, et al. OpenSSLNTRU: Faster post-quantum TLS key exchange [C] //Proc of the 31st USENIX Security Symp. Berkeley, CA: USENIX Association, 2022: 845−862
    [29]
    Fujisaki E,Okamoto T. Secure integration of asymmetric and symmetric encryption schemes [C] //Proc of the Annual Int Cryptology Conf. Berlin:Springer,1999:537−554
    [30]
    Hofheinz D,Hövelmanns K,Kiltz E. A modular analysis of the Fujisaki-Okamoto transformation [C] //Proc of the Theory of Cryptography Conf. Berlin:Springer,2017:341−371
    [31]
    Duman J,Hövelmanns K,Kiltz E,et al. Faster lattice-based KEMs via a generic Fujisaki-Okamoto transform using prefix hashing[C] //Proc of the 2021 ACM SIGSAC Conf on Computer and Communications Security. New York:ACM,2021:2722−2737
    [32]
    Nussbaumer H J, Nussbaumer H J. Number theoretic transforms [J/OL]. Fast Fourier Transform and Convolution Algorithms, 1981: 211−240[2024-09-14]. https://link.springer.com/chapter/10.1007/978-3-662-00551-4_8
    [33]
    Seiler G. Faster AVX2 optimized NTT multiplication for Ring-LWE lattice cryptography [J]. Cryptology ePrint Archive,2018
    [34]
    Cooley J W, Tukey J W. An algorithm for the machine calculation of complex Fourier series[J]. Mathematics of Computation, 1965, 19(90): 297−301 doi: 10.1090/S0025-5718-1965-0178586-1
    [35]
    Gentleman W M, Sande G. Fast fourier transforms: For fun and profit [C]//Proc of the Fall Joint Computer Conf. New York: ACM, 1966: 563−578
    [36]
    Weimerskirch A, Paar C. Generalizations of the Karatsuba algorithm for efficient implementations [J]. Cryptology ePrint Archive, 2006
    [37]
    Bernstein D J, Yang B Y. Fast constant-time gcd computation and modular inversion [J/OL]. IACR Transactions on Cryptographic Hardware and Embedded Systems, 2019: 340−398 [2024-09-14]. https://tches.iacr.org/index.php/TCHES/article/view/8298
    [38]
    Fog A. Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD, and VIA CPUs [EB/OL]. [2024-09-14]. https://www.agner.org/optimize/instruction_tables.pdf
    [39]
    Barrett P. Implementing the Rivest Shamir and Adleman public key encryption algorithm on a standard digital signal processor [C]//Proc of the Conf on the Theory and Application of Cryptographic Techniques. Berlin: Springer, 1986: 311−323
    [40]
    Montgomery P L. Modular multiplication without trial division[J]. Mathematics of Computation, 1985, 44(170): 519−521 doi: 10.1090/S0025-5718-1985-0777282-X
    [41]
    Nguyen D T, Gaj K. Optimized software implementations of CRYSTALS-Kyber, NTRU, and Saber using NEON-based special instructions of ARMv8 [C/OL]//Proc of the 3rd NIST PQC Standardization Conf. 2021 [2024-09-14]. https://csrc.nist.gov/CSRC/media/Events/third-pqc-standardization-conference/documents/accepted-papers/nguyen-optimized-software-gmu-pqc2021.pdf
    [42]
    Greub W H. Linear Algebra [M]. Berlin: Springer, 2012
    [43]
    Dworkin M J. SHA−3 Standard: Permutation-based Hash and Extendable-output Functions [S]. Gaithersburg: The National Institute of Standards and Technology (NIST), 2015
    [44]
    Schwabe P, Seiller G, Lepoint T et al. Kyber-post-quantum cryptography [CP/OL]. [2024-09-11]. https://github.com/pq-crystals/kyber
  • Related Articles

    [1]Peng Yingtao, Meng Xiaofeng, Du Zhijuan. Survey on Diversified Recommendation[J]. Journal of Computer Research and Development, 2025, 62(2): 285-313. DOI: 10.7544/issn1000-1239.202330600
    [2]MB-HGCN: A Hierarchical Graph Convolutional Network for Multi-behavior Recommendation“CCIR 2024推荐”[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440770
    [3]Zeng Weixin, Zhao Xiang, Tang Jiuyang, Tan Zhen, Wang Wei. Iterative Entity Alignment via Re-Ranking[J]. Journal of Computer Research and Development, 2020, 57(7): 1460-1471. DOI: 10.7544/issn1000-1239.2020.20190643
    [4]Dai Chenchao, Wang Hongyuan, Ni Tongguang, Chen Shoubing. Person Re-Identification Based on Deep Convolutional Generative Adversarial Network and Expanded Neighbor Reranking[J]. Journal of Computer Research and Development, 2019, 56(8): 1632-1641. DOI: 10.7544/issn1000-1239.2019.20190195
    [5]Gu Liang, Yang Peng, Dong Yongqiang. A Diversified Recommendation Method for UCL in Broadcast-Storage Network[J]. Journal of Computer Research and Development, 2017, 54(8): 1631-1643. DOI: 10.7544/issn1000-1239.2017.20170128
    [6]Meng Xiangfu, Bi Chongchun, Zhang Xiaoyan, Tang Xiaoliang, Tang Yanhuan. Web Database top-k Diverse Keyword Query Suggestion Approach[J]. Journal of Computer Research and Development, 2017, 54(7): 1577-1591. DOI: 10.7544/issn1000-1239.2017.20160005
    [7]Yu Wenzhe, Sha Chaofeng, He Xiaofeng, Zhang Rong. Review Selection Considering Opinion Diversity[J]. Journal of Computer Research and Development, 2015, 52(5): 1050-1060. DOI: 10.7544/issn1000-1239.2015.20131932
    [8]Wang Xianghai, Cong Zhihuan, Fang Lingling, Song Chuanming. HMM Training Model Using Blending Population Diversity Based Aaptive Genetic Algorithm Title[J]. Journal of Computer Research and Development, 2014, 51(8): 1833-1844. DOI: 10.7544/issn1000-1239.2014.20121211
    [9]Zhang Weiguo, Yin Xia, and Wu Jianping. A Computation Method of Path Diversity Based on AS Relationships[J]. Journal of Computer Research and Development, 2012, 49(1): 167-173.
    [10]Han Jianmin, Yu Juan, Yu Huiqun, Jia Jiong. A Multi-Level l-Diversity Model for Numerical Sensitive Attributes[J]. Journal of Computer Research and Development, 2011, 48(1): 147-158.
  • Cited by

    Periodical cited type(7)

    1. 罗宇哲,李玲,侯朋朋,于佳耕,程丽敏,张常有,武延军,赵琛. 面向AIoT的协同智能综述. 计算机研究与发展. 2025(01): 179-206 . 本站查看
    2. 王蕴,林霄,楼芝兰,李军,孙卫强. 面向边缘光算力网络的上行链路资源协同调度算法. 光通信技术. 2024(03): 45-51 .
    3. 王铭源,王正国,李济顺,薛玉君. 层级式机械装备健康指数模型及管理系统构建. 金属矿山. 2024(09): 198-206 .
    4. 王睿,王岩,尹朴,齐建鹏,孙叶桃,李倩,张易达,张梅奎. 面向边缘智能的协同训练研究进展. 工程科学学报. 2023(08): 1400-1416 .
    5. 薛建强,史彦军,李波. 面向无人集群的边缘计算技术综述. 兵工学报. 2023(09): 2546-2555 .
    6. 阴彦磊,王立华,廖伟智,张万达. 融合GRU-Attention与鲸鱼算法的流程制造工艺参数云边联动优化. 计算机集成制造系统. 2023(09): 2991-3005 .
    7. 许浩,朱晓娟. SDN中基于模型划分的云边协同推理算法. 兰州工业学院学报. 2023(06): 31-37 .

    Other cited types(20)

Catalog

    Article views (65) PDF downloads (22) Cited by(27)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return