A Fast and Secure Transformer Inference Scheme with Secure Multi-Party Computation

Liu Weixin; Guan Yewei; Huo Jiarong; Ding Yuanchao; Guo Hua; Li Bo

doi:10.7544/issn1000-1239.202330966

Liu Weixin, Guan Yewei, Huo Jiarong, Ding Yuanchao, Guo Hua, Li Bo. A Fast and Secure Transformer Inference Scheme with Secure Multi-Party Computation[J]. Journal of Computer Research and Development, 2024, 61(5): 1218-1229. DOI: 10.7544/issn1000-1239.202330966

Citation:

A Fast and Secure Transformer Inference Scheme with Secure Multi-Party Computation

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Transformer has been widely used in many fields such as natural language processing and computer vision, and has outstanding performance. The users’ data will be leaked to the Transformer model provider during inference. With the increasing public attention on data privacy, the above data leakage problem has triggered researchers’ study on secure Transformer inference. Implementing secure Transformer inference with secure multi-party computation (MPC) is today’s hot topic. Due to the widely existence of non-linear functions in Transformer, it is hard to use MPC to implement secure Transformer inference, which leads to huge computation and communication cost. We focus on Softmax attention, bottleneck in secure Transformer inference, and propose two kinds of MPC-friendly attention mechanism, Softmax freeDiv Attention and 2Quad freeDiv Attention. By replacing the Softmax attention in Transformer with the MPC-friendly attention mechanism proposed, combining with the replacement of activation function GeLU and knowledge distillation, we propose an MPC-friendly Transformer convert framework, which can convert Transformer model to an MPC-friendly one, so as to improve the performance of secure Transformer inference later. Based on the proposed MPC-friendly Transformer convert framework , we perform secure Bert-Base inference on SST-2 in the LAN setting, using privacy computing protocols provided by secure processing unit (SPU). The result shows that the secure inference achieves 2.26 times speedup while maintaining the accuracy with non-approximation model.

FullText(HTML)

References (32)

Cited By

Turn off MathJax

Article Contents

A Fast and Secure Transformer Inference Scheme with Secure Multi-Party Computation

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content