一种基于安全多方计算的快速Transformer安全推理方案

刘伟欣; 管晔玮; 霍嘉荣; 丁元朝; 郭华; 李博

doi:10.7544/issn1000-1239.202330966

一种基于安全多方计算的快速Transformer安全推理方案

A Fast and Secure Transformer Inference Scheme with MPC

摘要

摘要: Transformer模型在自然语言处理、计算机视觉等众多领域得到了广泛应用，并且有着突出的表现. 在Transformer的推理应用中用户的数据会被泄露给模型提供方. 随着数据隐私问题愈发得到公众的关注，上述数据泄露问题引发了学者们对Transformer安全推理的研究，使用安全多方计算（secure multi-party computation，MPC）实现Transformer模型的安全推理是当前的一个研究热点. 由于Transformer模型中存在大量非线性函数，因此使用MPC技术实现Transformer安全推理会造成巨大的计算和通信开销. 本文针对Transformer安全推理过程中开销较大的Softmax注意力机制，提出了2种MPC友好的注意力机制Softmax freeDiv Attention和2Quad freeDiv Attention. 通过将Transformer模型中的Softmax注意力机制替换为新的MPC友好的注意力机制，同时结合激活函数GeLU的替换以及知识蒸馏技术，提出了一个MPC友好的Transformer转换框架，通过将Transformer模型转化为MPC友好的Transformer模型，提高Transformer安全推理的效率. 在局域网环境下使用安全处理器（secure processing unit, SPU）提供的隐私计算协议，基于本文所提出的MPC友好的Transformer转换框架，在SST-2上使用Bert-Base进行安全推理. 测试结果表明，在保持推理准确率与无近似模型一致的情况下，安全推理计算效率提高2.26×.

Abstract: Transformer has been widely used in many fields such as natural language processing and computer vision, and has outstanding performance. The users’ data will be leaked to the Transformer model provider during inference. With the increasing public attention on data privacy, the above data leakage problem has triggered researchers’ study on secure Transformer inference. Implementing secure Transformer inference with secure multi-party computation (MPC) is today’s hot topic. Due to the widely existence of non-linear functions in Transformer, it is hard to use MPC to implement secure Transformer inference, which leads to huge computation and communication cost. This article focuses on Softmax attention, bottleneck in secure Transformer inference, and proposes two kinds of MPC-friendly attention mechanism, Softmax freeDiv Attention and 2Quad freeDiv Attention. By replacing the Softmax attention in Transformer with the MPC-friendly attention mechanism proposed, combining with the replacement of activation function GeLU and knowledge distillation, we propose an MPC-friendly Transformer convert framework, which can convert a Transformer model to an MPC-friendly one, so as to improve the performance of secure Transformer inference later. Based on the proposed MPC-friendly Transformer convert framework , we perform secure Bert-Base inference on SST-2 in the LAN setting, using privacy computing protocols provided by secure processing unit (SPU). The result shows that the secure inference achieves 2.26× speedup while maintaining the accuracy with non-approximation model.

HTML全文

参考文献(32)

施引文献

资源附件(0)