• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Han Songshen, Guo Songhui, Xu Kaiyong, Yang Bo, Yu Miao. Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure[J]. Journal of Computer Research and Development, 2024, 61(3): 685-700. DOI: 10.7544/issn1000-1239.202221034
Citation: Han Songshen, Guo Songhui, Xu Kaiyong, Yang Bo, Yu Miao. Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure[J]. Journal of Computer Research and Development, 2024, 61(3): 685-700. DOI: 10.7544/issn1000-1239.202221034

Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure

Funds: This work was supported by the National Natural Science Foundation of China (62176265).
More Information
  • Author Bio:

    Han Songshen: born in 1999. Master. His main research interests include artificial intelligence security and cloud computing security

    Guo Songhui: born in 1979. PhD, professor. His main research interests include 5G security and cloud computing security

    Xu Kaiyong: born in 1963. PhD, professor. His main research interests include information security and trusted computing

    Yang Bo: born in 1993. PhD candidate. His main research interests include deep learning, and intelligent system security testing and evaluation

    Yu Miao: born in 1987. Master candidate. His main research interests include artificial intelligence security and natural language processing

  • Received Date: December 19, 2022
  • Revised Date: May 03, 2023
  • Available Online: November 30, 2023
  • At present, adversarial attacks on speech recognition models have typically involved adding noise to the entire speech signal, resulting in a wide perturbation range and introducing high-frequency noise. Existing research has attempted to reduce the perturbation range by designing optimization targets. However, controlling the transcription result requires adding perturbations to each frame, thus limiting further reduction in perturbation range. To address this issue, we propose a novel approach that examines the feature extraction process of speech recognition systems from a frame structure perspective. The study finds that framing and windowing determine the distribution of critical regions within the frame structure. Specifically, the weight of adding perturbation to each sampling point within the frame is influenced by its location. Based on the results of perturbation analysis on input features, we partition regions with shared attributes. Then we propose the adversarial example space measurement method and evaluation index to quantify the weight of sampling points for adversarial examples generation. We conduct cross-experiments by adding perturbations at different intervals within the frame, which enables us to identify key regions for perturbation addition. Our experiments on multiple models demonstrate that adding adversarial perturbation to vital regions can narrow the perturbation range, and provide a new perspective for generating high-quality audio adversarial examples.

  • [1]
    Li Jinyu. Recent advances in end-to-end automatic speech recognition[J]. APSIPA Transactions on Signal and Information Processing. 2022, 11(1): e8
    [2]
    Pan Shanrong. Design of intelligent robot control system based on human–computer interaction[J]. International Journal of System Assurance Engineering and Management, 2023, 14: 558−567
    [3]
    魏春雨,孙蒙,邹霞,等. 语音对抗样本的攻击与防御综述[J]. 信息安全学报,2022,7(1):100−113

    Wei Chunyu, Sun Meng, Zou Xia, et al. Reviews on the attack and defense methods of voice adversarial examples[J]. Journal of Cyber Security, 2022, 7(1): 100−113 (in Chinese)
    [4]
    Wang Donghua, Wang Rangding, Dong Li, et al. Adversarial examples attack and countermeasure for speech recognition system: A survey[C] //Proc of the 1st Security and Privacy in Digital Economy. Berlin: Springer, 2020: 443−468
    [5]
    Li Zhouhang, Wu Yi, Liu Jian, et al. Advpulse: Universal, synchronization-free, and targeted audio adversarial attacks via subsecond perturbations[C] //Proc of the 20th ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2020: 1121−1134
    [6]
    Zheng Baolin, Jiang Peipei, Wang Qian, et al. Black-box adversarial attacks on commercial speech platforms with minimal information[C] //Proc of the 21st ACM SIGSAC Conf on Computer and Communications Security. New York: ACM, 2021: 86−107
    [7]
    Carlini N, Wagner D. Audio adversarial examples: Targeted attacks on speech-to-text[C] //Proc of the 39th IEEE Security and Privacy Workshops. Piscataway, NJ: IEEE, 2018: 1−7
    [8]
    Taori R, Kamsetty A, Chu B, et al. Targeted adversarial examples for black box audio systems[C] //Proc of the 40th IEEE Security and Privacy Workshops. Piscataway, NJ: IEEE, 2019: 15−20
    [9]
    Liu Xiaolei, Wan Kun, Ding Yufei, et al. Weighted-sampling audio adversarial example attack[C] //Proc of the 34th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2020: 4908−4915
    [10]
    Tay K Y, Ng L, Chua W H, et al. Audio adversarial examples: Attacks using vocal masks[J]. arXiv preprint, arXiv: 2102. 02417, 2021
    [11]
    Qin Yao, Carlini N, Cottrell G, et al. Imperceptible, robust, and targeted adversarial examples for automatic speech recognition[C] //Proc of the 36th Machine Learning Research. New York: PLMR 2019: 5231−5240
    [12]
    Vadillo J, Santana R. On the human evaluation of universal audio adversarial perturbations[J]. Computers & Security, 2022, 112: 102495
    [13]
    Xie Yi, Li Zhuohang, Shi Cong, et al. Real-time, robust and adaptive universal adversarial attacks against speaker recognition systems[J]. Journal of Signal Processing Systems, 2021, 93(10): 1187−1200 doi: 10.1007/s11265-020-01629-9
    [14]
    Xie Yi, Li Zhuohang, Shi Cong, et al. Enabling fast and universal audio adversarial attack using generative model[C] //Proc of the 35th AAAI Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021: 14129−14137
    [15]
    Eisenhofer T, Schönherr L, Frank J, et al. Dompteur: Taming audio adversarial examples[J]. arXiv preprint, arXiv: 2102. 05431, 2021
    [16]
    Schönherr L, Kohls K, Zeiler S, et al. Adversarial attacks against automatic speech recognition systems via psychoacoustic hiding[J]. arXiv preprint, arXiv: 1808.05665, 2018
    [17]
    Malik M, Malik M K, Mehmood K, et al. Automatic speech recognition: A survey[J]. Multimedia Tools and Applications, 2021, 80(6): 9411−9457 doi: 10.1007/s11042-020-10073-7
    [18]
    Gupta D, Bansal P, Choudhary K. The state of the art of feature extraction techniques in speech recognition[C] //Proc of Speech and Language Processing for Human-Machine Communications. Berlin: Springer, 2018: 195−207
    [19]
    沈轶杰,李良澄,刘子威,等. 基于单“音频像素”扰动的说话人识别隐蔽攻击[J]. 计算机研究与发展,2021,58(11):2350−2363 doi: 10.7544/issn1000-1239.2021.20210632

    Shen Yijie, Li Liangcheng, Liu Ziwei, et al. Stealthy attack towards speaker recognition based on one-“audio pixel” perturbation[J]. Journal of Computer Research and Development, 2021, 58(11): 2350−2363 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210632
    [20]
    Sood M, Jain S. Speech recognition employing MFCC and dynamic time warping algorithm[C] //Proc of Innovations in Information and Communication Technologies. Berlin: Springer, 2021: 235−242
    [21]
    Pardede H F, Zilvan V, Krisnandi D, et al. Generalized filter-bank features for robust speech recognition against reverberation[C] //Proc of the 7th Int Conf on Computer, Control, Informatics and Its Applications. Piscataway, NJ: IEEE, 2019: 19−24
    [22]
    Keshishian M, Norman-Haignere S, Mesgarani N. Understanding adaptive, multiscale temporal integration in deep speech recognition systems[C] //Proc of the 35th Advances in Neural Information Processing Systems. Cambridge, MA: MIT, 2021: 24455−24467
    [23]
    Ravanelli M, Parcollet T, Bengio Y. The pytorch-Kaldi speech recognition toolkit[C] //Proc of the 44th IEEE Int Conf on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2019: 6465−6469
    [24]
    Shen J, Nguyen P, Wu Yonghui, et al. Lingvo: A modular and scalable framework for sequence-to-sequence modeling[J]. arXiv preprint, arXiv: 1902.08295, 2019
    [25]
    洪青阳,李琳. 语音识别:原理与应用[M]. 第2版. 北京:电子工业出版社,2023

    Hong Qingyang, Li Lin. Principle and Application of Speech Recognition[M]. 2nd ed. Beijing: Publishing House of Electronics Industry, 2023(in Chinese)
    [26]
    Panayotov V, Chen Guoguo, Povey D, et al. LibriSpeech: An ASR corpus based on public domain audio books[C] //Proc of the 40th IEEE Int Conf on Acoustics, Speech and Signal Processing. Piscataway, NJ: IEEE, 2015: 5206−5210
    [27]
    Breithaupt C, Martin R. Statistical analysis and performance of DFT domain noise reduction filters for robust speech recognition[C/OL] //Proc of the 9th Int Conf on Spoken Language Processing. ISCA, 2006: 365−368. [2022-12-01]. https://www.isca-speech.org/archive/pdfs/interspeech_2006/breithaupt06_interspeech.pdf
    [28]
    Ravindran S, Anderson D V, Slaney M. Improving the noise-robustness of mel-frequency cepstral coefficients for speech processing[J]. Reconstruction, 2006, 12(S14): 48−52
    [29]
    Schuster G, Ansorge R. WOLA noise cancelling performance[C] //Proc of the 16th European Signal Processing Conf. Piscataway, NJ: IEEE, 2008: 1−5
    [30]
    Zhang Weiqiang, Yang Dengzhou, Liu Jia, et al. Perturbation analysis of mel-frequency cepstrum coefficients[C] //Proc of the 2nd Int Conf on Audio, Language and Image Processing. Piscataway, NJ: IEEE, 2010: 715−718
    [31]
    Szegedy C, Zaremba W, Sutskever I, et al. Intriguing properties of neural networks[J]. arXiv preprint, arXiv: 1312. 6199, 2013
    [32]
    Chen Yuxuan, Zhang Jiangshan, Yuan Xuejing, et al. Sok: A modularized approach to study the security of automatic speech recognition systems[J]. ACM Transactions on Privacy and Security, 2022, 25(3): 1−31
    [33]
    Xu Zirui, Yu Fuxun, Liu Chenchen, et al. HAMPER: High- performance adaptive mobile security enhancement against malicious speech and image recognition[C]// Proc of the 24th Asia and South Pacific Design Automation Conf. New York: ACM, 2019: 512−517
    [34]
    Abdullah H, Rahman M S, Garcia W, et al. Hear" no evil", see" kenansville": Efficient and transferable black-box attacks on speech recognition and voice identification systems[C] //Proc of the 42nd Symp on Security and Privacy. Piscataway, NJ: IEEE, 2021: 712−729
    [35]
    Wang Qian, Zheng Baolin, Li Qi, et al. Towards query-efficient adversarial attacks against automatic speech recognition systems[J]. IEEE Transactions on Information Forensics and Security, 2021, 16: 896−908 doi: 10.1109/TIFS.2020.3026543
    [36]
    Xie Yi, Li Zhuohang, Shi Cong, et al. Enabling fast and universal audio adversarial attack using generative model[C]// Proc of the 35th Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2021, 35(16): 14129−14137
    [37]
    张万里,陈越,杨奎武,等. 一种局部遮挡人脸识别的对抗样本生成方法[J]. 计算机研究与发展,2023, 60(9): 2067−2079

    Zhang Wanli, Chen Yue, Yang Kuiwu, et al. An adversarial example generation method for locally occluded face recognition[J]. Journal of Computer Research and Development, 2023, 60(9): 2067−2079(in Chinese)
    [38]
    Abdullah H, Warren K, Bindschaedler V, et al. SoK: The faults in our ASRs: An overview of attacks against automatic speech recognition and speaker identification systems[C] //Proc of the 42nd Symp on Security and Privacy. Piscataway, NJ: IEEE, 2021: 730−747
    [39]
    Zong Wei, Chow Y W, Susilo W. Towards visualizing and detecting audio adversarial examples for automatic speech recognition[C] //Proc of the 26th Symp Information Security and Privacy. Berlin: Springer, 2021: 531−549
    [40]
    Madry A, Makelov A, Schmidt L, et al. Towards deep learning models resistant to adversarial attacks[J]. arXiv preprint, arXiv: 1706.06083, 2017
    [41]
    Wu Xiaoliang, Rajan A. Catch me if you can: Blackbox adversarial attacks on automatic speech recognition using frequency masking[C] //Proc of 29th Asia-Pacific Software Engineering Conf. Piscataway, NJ: IEEE, 2022: 169−178
    [42]
    Kwon H, Kim Y, Yoon H, et al. Selective audio adversarial example in evasion attack on speech recognition system[J]. IEEE Transactions on Information Forensics and Security, 2020, 15: 526−538 doi: 10.1109/TIFS.2019.2925452
    [43]
    Mathov Y, Ben Senior T, Shabtai A, et al. Stop bugging me! Evading modern-day wiretapping using adversarial perturbations[J]. Computers & Security, 2022, 121: 102841
  • Related Articles

    [1]Wang Yanwei, Li Rengang, Xu Ran, Liu Junkai. Data Center Heterogeneous Acceleration Software-Hardware System-Level Platform Based on Reconfigurable Architecture[J]. Journal of Computer Research and Development, 2025, 62(4): 963-977. DOI: 10.7544/issn1000-1239.202440041
    [2]Li Rengang, Wang Yanwei, Hao Rui, Xiao Linge, Yang Le, Yang Guangwen, Kan Hongwei. Direct xPU: A Novel Distributed Heterogeneous Computing Architecture Optimized for Inter-node Communication Optimization[J]. Journal of Computer Research and Development, 2024, 61(6): 1388-1400. DOI: 10.7544/issn1000-1239.202440055
    [3]Xie Minhui, Lu Youyou, Feng Yangyang, Shu Jiwu. A Recommendation Model Inference System Based on GPU Direct Storage Access Architecture[J]. Journal of Computer Research and Development, 2024, 61(3): 589-599. DOI: 10.7544/issn1000-1239.202330402
    [4]Feng Xinyue, Yang Qiusong, Shi Lin, Wang Qing, Li Mingshu. Critical Memory Data Access Monitor Based on Dynamic Strategy Learning[J]. Journal of Computer Research and Development, 2019, 56(7): 1470-1487. DOI: 10.7544/issn1000-1239.2019.20180577
    [5]Mao Haiyu, Shu Jiwu. 3D Memristor Array Based Neural Network Processing in Memory Architecture[J]. Journal of Computer Research and Development, 2019, 56(6): 1149-1160. DOI: 10.7544/issn1000-1239.2019.20190099
    [6]Su Wen, Zhang Longbing, Gao Xiang, Su Menghao. A Cache Locking and Direct Cache Access Based Network Processing Optimization Method[J]. Journal of Computer Research and Development, 2014, 51(3): 681-690.
    [7]Cai Wanwei, Tai Yunfang, Liu Qi, Zhang Ge. Memory Virtulization on MIPS Architecture[J]. Journal of Computer Research and Development, 2013, 50(10): 2247-2252.
    [8]Shen Huanghui, Wang Zhensong, Zheng Weimin. An Efficient Memory Access Strategy for Transposition and Block Operation in Image Processing[J]. Journal of Computer Research and Development, 2013, 50(1): 188-196.
    [9]Liu Dan, Feng Yi, Tong Dong, Cheng Xu, and Wang Keyi. A Bus Arbitration Scheme for Memory Access Performance Optimization[J]. Journal of Computer Research and Development, 2012, 49(5): 1061-1071.
    [10]Wang Kai, Chen Fei, Li Qiang, Li Xiaomin, An Xuejun, Sun Ninghui. Research on Hyper-Node Controller for High Performance Computer[J]. Journal of Computer Research and Development, 2011, 48(1): 1-8.
  • Cited by

    Periodical cited type(2)

    1. 穆宇栋,李文明,范志华,吴萌,吴海彬,安学军,叶笑春,范东睿. 面向YOLO神经网络的数据流架构优化研究. 计算机学报. 2025(01): 82-99 .
    2. 冯仕豪. 应用5G物联网技术的群控机器人多主机通信方法. 物联网技术. 2024(10): 56-60 .

    Other cited types(2)

Catalog

    Article views (1) PDF downloads (0) Cited by(4)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return