Multi-Platform Efficient Implementation and Optimization of Aigis-enc Algorithm
-
Graphical Abstract
-
Abstract
The new challenges brought by the rapid development of quantum computing technology have made post-quantum cryptography (PQC) a hot research topic in the current cryptographic community. The Aigis-enc key encapsulation mechanism is a post-quantum cryptographic algorithm based on the asymmetric module learning with errors (A-MLWE) problem, which is one of the algorithms that won the first prizes of public key cryptographic algorithms in the National Cryptographic Algorithm Design Competition held by the Chinese Association for Cryptologic Research. In order to resist quantum attacks, maintain the long-term security of national cyberspace, and contribute to the development of future national PQC algorithm standards, it is important to optimize the excellent post-quantum cryptographic algorithms developed by Chinese scholars. In this paper, we focus on optimizing the Aigis-enc algorithm for different platforms, including fast parallel implementation for high-performance platforms and compact implementation for embedded low-power platforms. Specifically, we fully optimize the existing AVX2 implementation of Aigis-enc using single instruction multiple data stream (SIMD) instructions, and provide its first lightweight compact implementation for the ARM Cortex-M4 platform. Our implementation includes the following optimizations: reducing the number of assembly instructions for Montgomery and Barrett reduction to improve the efficiency of reduction; using number theoretic transformations with trimmed layers and optimized instruction pipelining to speed up polynomial multiplication and reduce the precomputed table storage; providing a parallel implementation of assembly instructions for polynomial serialization and deserialization to speed up the processes of encoding, decoding and encryption; combining on-the-fly computation and space multiplexing to optimize the algorithm storage space. The experimental results show that the proposed optimization techniques can improve the original AVX2 implementation of the Aigis-enc-768 algorithm by 25% on an 8-core Intel Core i7 processor, and significantly reduce its precomputed table storage, code size and stack usage on the ARM Cortex-M4 platform, which is of great practical importance for future deployment of the algorithm.
-
-