Abstract:
SM4 algorithm is a commercial block cipher algorithm independently designed by China, and its encryption and decryption performance has become one of the critical factors affecting the data confidentiality of the information system. The existing optimizations mainly focus on hardware designs and software look-up tables, which have problems such as dependence on specific hardware environments, low efficiency, and vulnerability to side-channel attacks. Bit slicing technology efficiently processes block ciphers in parallel by reorganizing input data, and can resist side-channel attacks against caches. However, the existing researches on bitsliced block ciphers are highly dependent on the hardware platforms and only support a single processor architecture, and the parallel processing pipeline starts slowly. It is difficult for the encryption and decryption operations for small-scale data to give full play to the advantages of advanced instruction sets such as SIMD (single instruction multiple data) instructions. To resolve the above problems, we firstly propose a cross-platform general bitsliced block cipher algorithm model, which supports a general data slicing method that provides consistent data slicing for different processor instructions. Based on that, a fine-grained bitsliced SM4 optimization algorithm for SIMD instructions is proposed, which can effectively shorten the startup time of the algorithm through fine-grained plaintext slicing reorganization and linear transformation optimization. The experiments show that, compared with the look-up table-based SM4 algorithm, the encryption rate can reach up to 438.0 MBps. The clock cycles required for encrypting a byte are up to 7.0 CPB (cycle/B), and the encryption performance is improved by an average of 80.4% to 430.3%.