基于帧结构的语音对抗样本重点区域扰动分析

韩松莘; 郭松辉; 徐开勇; 杨博; 于淼

doi:10.7544/issn1000-1239.202221034

基于帧结构的语音对抗样本重点区域扰动分析

Perturbation Analysis of the Vital Region in Speech Adversarial Example Based on Frame Structure

摘要

摘要: 目前针对语音识别模型的对抗攻击主要是在整条语音上添加噪声，扰动范围大且引入了高频噪声. 现有研究在一定程度上缩小了扰动范围，但由于语音对抗攻击需要在每帧添加扰动实现对转录结果的控制，限制了扰动范围的进一步降低. 针对此问题，从帧结构的角度研究了语音识别系统中的特征提取流程，发现分帧和加窗处理决定了帧结构中重点区域的分布，即帧内各采样点上添加扰动的重要性受采样点所处位置的影响. 首先，根据对输入特征的扰动分析结果进行区域划分；然后，为了量化这些采样点对求解对抗样本的重要性，提出了对抗样本空间度量方法和相应的评价指标，并设计了在帧内不同区间上添加扰动的交叉实验，进而确定了扰动添加的重点区域；最后，在多个模型上进行了广泛的实验，表明了在重点区域添加对抗扰动能够缩小扰动范围，为高质量语音对抗样本的生成提出新的角度.

Abstract: At present, adversarial attacks on speech recognition models have typically involved adding noise to the entire speech signal, resulting in a wide perturbation range and introducing high-frequency noise. Existing research has attempted to reduce the perturbation range by designing optimization targets. However, controlling the transcription result requires adding perturbations to each frame, thus limiting further reduction in perturbation range. To address this issue, we propose a novel approach that examines the feature extraction process of speech recognition systems from a frame structure perspective. The study finds that framing and windowing determine the distribution of critical regions within the frame structure. Specifically, the weight of adding perturbation to each sampling point within the frame is influenced by its location. Based on the results of perturbation analysis on input features, we partition regions with shared attributes. Then we propose the adversarial example space measurement method and evaluation index to quantify the weight of sampling points for adversarial examples generation. We conduct cross-experiments by adding perturbations at different intervals within the frame, which enables us to identify key regions for perturbation addition. Our experiments on multiple models demonstrate that adding adversarial perturbation to vital regions can narrow the perturbation range, and provide a new perspective for generating high-quality audio adversarial examples.

HTML全文

参考文献(43)

施引文献

资源附件(0)