高级检索
    韩松莘, 郭松辉, 徐开勇, 杨博, 于淼. 基于帧结构的语音对抗样本重点区域扰动分析[J]. 计算机研究与发展. DOI: 10.7544/issn1000-1239.202221034
    引用本文: 韩松莘, 郭松辉, 徐开勇, 杨博, 于淼. 基于帧结构的语音对抗样本重点区域扰动分析[J]. 计算机研究与发展. DOI: 10.7544/issn1000-1239.202221034
    Han Songshen, Guo Songhui, Xu Kaiyong, Yang Bo, Yu Miao. Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202221034
    Citation: Han Songshen, Guo Songhui, Xu Kaiyong, Yang Bo, Yu Miao. Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202221034

    基于帧结构的语音对抗样本重点区域扰动分析

    Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure

    • 摘要: 目前针对语音识别模型的对抗攻击主要是在整条语音上添加噪声,扰动范围大且引入了高频噪声. 现有研究在一定程度上缩小了扰动范围,但由于语音对抗攻击需要在每帧添加扰动实现对转录结果的控制,限制了扰动范围的进一步降低. 针对此问题,首次从帧结构的角度研究了语音识别系统中的特征提取流程,发现分帧和加窗处理决定了帧结构中重点区域的分布,即帧内各采样点上添加扰动的重要性受采样点所处位置影响. 首先,根据对输入特征的扰动分析结果进行区域划分;然后,为了量化这些采样点对求解对抗样本的重要性,提出了对抗样本空间度量方法和相应的评价指标,并设计了在帧内不同区间上添加扰动的交叉实验,进而确定了扰动添加的重点区域. 最后,在多个模型上进行了广泛的实验,表明了在重点区域添加对抗扰动能够缩小扰动范围,为高质量语音对抗样本的生成提出新的角度.

       

      Abstract: At present, adversarial attacks on speech recognition models typically involve adding noise to the entire speech signal, resulting in a wide perturbation range and introducing high-frequency noise. Existing research has attempted to reduce the perturbation range by designing optimization targets. However, controlling the transcription result requires adding perturbations to each frame, thus limiting further reduction in perturbation range. To address this issue, this paper proposes a novel approach that examines the feature extraction process of speech recognition systems from a frame structure perspective. The study found that framing and windowing determine the distribution of critical regions within the frame structure. Specifically, the weight of adding perturbation to each sampling point within the frame is influenced by its location. Based on the results of perturbation analysis on input features, we partitioned regions with shared attributes. Then proposed the adversarial example space measurement method and evaluation index to quantify the weight of sampling points to adversarial examples generation. We conducted cross-experiments by adding perturbations at different intervals within the frame, which enabled us to identify key regions for perturbation addition. Our experiments on multiple models demonstrated that adding adversarial perturbation to vital regions can narrow the perturbation range, and provide a new perspective for generating high-quality audio adversarial examples.

       

    /

    返回文章
    返回