高级检索

    基于国产深度计算单元的SPHINCS+-SM3高性能优化实现

    High-Performance Optimization of SPHINCS+-SM3 Implementation Based on Domestic Deep Computing Unit

    • 摘要: 数字签名在信息安全中扮演着至关重要的角色,但传统的数字签名算法在后量子时代面临失效风险. SPHINCS+作为一种能够抵抗量子计算攻击的数字签名框架,将在后量子时代发挥越来越重要的作用. 然而,SPHINCS+的计算速度较慢,难以满足现代密码算法对于高吞吐量和低延时的需求,极大地限制了其实用性. 提出了一种基于国产深度计算单元(deep computing unit,DCU)的高效优化方案,以加速由国产哈希算法SM3实例化的SPHINCS+算法. 通过提高内存拷贝效率、优化SM3、改进SPHINCS+的计算流程以及采用最佳计算并行度,在DCU上实现了SPHINCS+-SM3的128-f模式. 实验结果表明,与传统CPU实现相比,DCU上的实现显著提高了签名生成和验证的吞吐量,分别达到了2603.87倍和1281.98倍的提升,极大地增强了SPHINCS+的计算效率和实用性,并推进了后量子密码算法的国产化进程. 在高数据流量和大量签名请求的场景下,DCU实现展现出显著优于CPU实现的性能优势.

       

      Abstract: Digital signatures play a critical role in information security; however, traditional digital signature algorithms are at risk of becoming obsolete in the post-quantum era. SPHINCS+, as a digital signature framework resistant to quantum computing attacks, is expected to become increasingly important in this new era. Nevertheless, the relatively slow computational speed of SPHINCS+ poses challenges in meeting the high throughput and low latency demands of modern cryptographic applications, significantly limiting its practicality. This paper presents an efficient optimization strategy based on a domestic DCU (Deep Computing Unit) to accelerate the SPHINCS+ algorithm instantiated with the domestic SM3 hash function. By enhancing memory copy efficiency, optimizing the computational processes of SM3 and SPHINCS+, and employing optimal computational parallelism, we implemented the 128-f mode of SPHINCS+-SM3 on the DCU. Experimental results demonstrate that, compared to traditional CPU implementations, our DCU-based implementation achieves a significant increase in throughput, improving signature generation and verification by 2603.87 times and 1281.98 times, respectively. This substantial improvement in computational efficiency and practicality enhances the feasibility of SPHINCS+ and advances the domestic adoption of post-quantum cryptographic algorithms. In scenarios involving high data traffic and large volumes of signature requests, the DCU implementation exhibits significant performance advantages over CPU implementations.

       

    /

    返回文章
    返回