ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (11): 2350-2363.doi: 10.7544/issn1000-1239.2021.20210632

Special Issue: 2021密码学与网络空间安全治理专题

Previous Articles     Next Articles

Stealthy Attack Towards Speaker Recognition Based on One-“Audio Pixel” Perturbation

Shen Yijie1, Li Liangcheng1, Liu Ziwei1, Liu Tiantian1, Luo Hao1, Shen Ting3, Lin Feng1,2, Ren Kui1   

  1. 1(Institute of Cyberspace Research, Zhejiang University, Hangzhou 310027);2(Key Laboratory of Blockchain and Cyberspace Governance of Zhejiang Province (Zhejiang University), Hangzhou 310027);3(Zhejiang Dong’an Testing Technology Co., Ltd., Hangzhou 310063)
  • Online:2021-11-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2020AAA0107700), the National Natural Science Foundation of China(62032021, 61772236, 61972348), Zhejiang Key Research and Development Plan (2019C03133), the Leading Innovative and Entrepreneur Team Introduction Program of Zhejiang (2018R01005), the Fund of Alibaba-Zhejiang University Joint Institute of Frontier Technologies, and the Fund of Research Institute of Cyberspace Governance in Zhejiang University.

Abstract: Attacks towards the speaker recognition system need to inject a long-time perturbation, so it is easy to be detected by machines or administrators. We propose a novel attack towards the speaker recognition based on one-“audio pixel”. Such attack uses the black-box characteristics and search mode of the differential evolution algorithm that does not rely on the model and the gradient information. It overcomes the problem in previous works that the disturbance duration cannot be constrained. Thus, our attack effectively spoofs the speaker recognition via one-“audio pixel” perturbation. In particular, we design a candidate point construction model based on the audio-point-disturbance tuple targeting time series of audio data. It solves the problem that candidate points of differential evolution algorithm are difficult to be described against our attack. The success rate of our attack achieves 100% targeting 60 people in LibriSpeech dataset. In addition, we also conduct abundant experiments to explore the impact of different conditions (e.g., gender, dataset and speaker recognition method) on the performance of our stealthy attack. The result of above experiments provides guidance for effective attacks. At the same time, we put forward ideas based on denoising, reconstruction algorithm and speech compression to defend against our stealthy attack, respectively.

Key words: one-“audio pixel” perturbation, black-box attack, speaker recognition, differential evolution algorithm, perturbation attack

CLC Number: