• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhang Xiaohui, Yi Jiangyan, Tao Jianhua, Zhou Junzuo. Elastic Orthogonal Weight Modification Continual Learning Algorithm in the Context of Synthetic Speech Detection[J]. Journal of Computer Research and Development, 2025, 62(2): 336-345. DOI: 10.7544/issn1000-1239.202330311
Citation: Zhang Xiaohui, Yi Jiangyan, Tao Jianhua, Zhou Junzuo. Elastic Orthogonal Weight Modification Continual Learning Algorithm in the Context of Synthetic Speech Detection[J]. Journal of Computer Research and Development, 2025, 62(2): 336-345. DOI: 10.7544/issn1000-1239.202330311

Elastic Orthogonal Weight Modification Continual Learning Algorithm in the Context of Synthetic Speech Detection

Funds: This work was supported by the National Key Research and Development Program of China (2020AAA0140003) and the National Natural Science Foundation of China (61831022, U21B2010, 61901473, 62006223, 2101553).
More Information
  • Author Bio:

    Zhang Xiaohui: born in 1998. Master candidate. His main research interests include deep fake audio detection and continual learning

    Yi Jiangyan: born in 1984. PhD, Master supervisor. Her main research interests include speech information processing, speech generation and identification, and continuous learning

    Tao Jianhua: born in 1972. PhD, PhD supervisor. His main research interests include intelligent information fusion and processing, speech processing, affective computing, and big data analysis

    Zhou Junzuo: born in 2000. Master candidate. His main research interest includes text-to-speech

  • Received Date: April 09, 2023
  • Revised Date: January 07, 2024
  • Available Online: December 11, 2024
  • Currently, deep learning has achieved significant success in the field of synthetic speech detection. However, deep models commonly attain high accuracy on test sets that closely match their training distribution but exhibit a substantial drop in accuracy in cross-dataset scenarios. To enhance the generalization capability of models on new datasets, they are often fine-tuned with new data, but this leads to catastrophic forgetting, where the model’s knowledge learned from old data is impaired, resulting in deteriorated performance on the old data. Continuous learning is a prevalent approach to mitigate catastrophic forgetting. In this paper, we propose a continuous learning algorithm called elastic orthogonal weight modification (EOWM) to address catastrophic forgetting for synthetic speech detection. EOWM mitigates knowledge degradation by adjusting the direction and magnitude of parameter updates when the model learns new knowledge. Specifically, it enforces the updates’ direction to be orthogonal to the data distribution of the old tasks while constraining the magnitude of updates for important parameters in the old tasks. Our proposed algorithm demonstrates promising results in cross-dataset experiments within the domain of synthetic speech detection. Compared with fine-tuning, EOWM reduces the equal error rate (EER) on the old dataset from 7.334% to 0.821%, representing a relative improvement of 90%, and on the new dataset, it decreases EER from 0.513% to 0.315%, corresponding to a relative improvement of 40%.

  • [1]
    Wu Zhizheng, Kinnunen T, Evans N, et al. ASVspoof 2015: The first automatic speaker verification spoofing and countermeasures challenge[C]//Proc of the 16th Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2015: 2037−2041
    [2]
    Kinnunen T, Sahidullah M, Delgado H, et al. The ASVspoof 2017 challenge: Assessing the limits of replay spoofing attack detection[C]//Proc of the 18th Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2017: 2−6
    [3]
    Todisco M, Wang Xin, Vestman V, et al. ASVspoof 2019: Future horizons in spoofed and fake audio detection[C]//Proc of the 20th Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2019: 1008−1012
    [4]
    Yamagishi J, Wang Xin, Todisco M, et al. ASVspoof 2021: Accelerating progress in spoofed and deepfake speech detection[J]. arXiv preprint, arXiv: 2109.00537, 2021
    [5]
    Yi Jiangyan, Fu Ruibo, Tao Jianhua, et al. ADD 2022: The first audio deep synthesis detection challenge[C]//Proc of 47th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2022: 9216−9220
    [6]
    Müller N, Czempin P, Dieckmann F, et al. Does audio deepfake detection generalize[C]//Proc of the 23rd Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2022: 2783−2787
    [7]
    Zhang You, Zhu Ge, Jiang Fei, et al. An empirical study on channel effects for synthetic voice spoofing countermeasure systems[C]//Proc of the 22nd Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2021: 4309−4313
    [8]
    Zeng Guanxiong, Chen Yang, Cui Bo, et al. Continual learning of context-dependent processing in neural networks[J]. Nature Machine Intelligence, 2019, 1: 364−372 doi: 10.1038/s42256-019-0080-x
    [9]
    Kirkpatrick J, Pascanu R, Rabinowitz N, et al. Overcoming catastrophic forgetting in neural networks[J]. Proceedings of the National Academy of Sciences, 2017, 114(13): 3521−3526 doi: 10.1073/pnas.1611835114
    [10]
    Parisi G, Kemker R, Part J, et al. Continual lifelong learning with neural networks: A review[J]. Neural Networks, 2019, 113: 54−71 doi: 10.1016/j.neunet.2019.01.012
    [11]
    Lopez-Paz D, Muandet K, Schölkopf B, et al. Towards a learning theory of cause-effect inference[J]. arXiv preprint, arXiv: 1502.02398, 2015
    [12]
    Aljundi R, Rohrbach M, Tuytelaars T. Selfless Sequential Learning[J]. arXiv preprint, arXiv: 1806.05421, 2019
    [13]
    Lomonaco V, Maltoni D. CORe50: A new dataset and benchmark for continuous object recognition[J]. arXiv preprint, arXiv: 1705.03550, 2017
    [14]
    Shmelkov K, Schmid C, Alahari K, et al. Incremental learning of object detectors without catastrophic forgetting[C]//Proc of the 16th IEEE Int Conf on Computer Vision (ICCV). Piscataway, NJ: IEEE, 2017: 3420−3429
    [15]
    Rebuffi S, Kolesnikov A, Sperl G, et al. ICaRL: Incremental classifier and representation learning[C]//Proc of the 30th IEEE Conf on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2017: 5533−5542
    [16]
    Rusu A, Rabinowitz N, Desjardins G, et al. Progressive neural networks[J]. arXiv preprint, arXiv: 1606.04671, 2016
    [17]
    Schwarz J, Czarnecki W, Luketina J, et al. Progress & compress: A scalable framework for continual learning[J]. arXiv preprint, arXiv: 1805.06370, 2018
    [18]
    Yoon J, Yang E, Lee J, et al. Lifelong learning with dynamically expandable networks[J]. arXiv preprint, arXiv: 1708.01547, 2018
    [19]
    Lopez-Paz D, Ranzato M. Gradient episodic memory for continual learning[C]//Proc of the 31st Annual Conf on Neural Information Processing Systems (NIPS). Cambridge, MA: MIT, 2017: 6467−6476
    [20]
    Castro F, Maŕın-Jiḿenez M, Guil N, et al. End-to-end incremental learning[C]//Proc of the 15th European Conf on Computer Vision (ECCV). Berlin: Springer, 2018: 233−248
    [21]
    Wu Yue, Chen Yinpeng, Wang Lijuan, et al. Large-scale incremental learning[C]//Proc of the 32nd IEEE Conf on Computer Vision and Pattern Recognition (CVPR). Piscataway, NJ: IEEE, 2019: 374−382
    [22]
    Lee K, Lee K, Shin J, et al. Overcoming catastrophic forgetting with unlabeled data in the wild[C]//Proc of the 17th IEEE Int Conf on Computer Vision (ICCV). Piscataway, NJ: IEEE, 2019: 312−321
    [23]
    Li Zhizhong, Hoiem D. Learning without forgetting[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2017, 40(12): 2935−2947
    [24]
    Monteiro J, Alam J, Falk T, et al. An ensemble based approach for generalized detection of spoofing attacks to automatic speaker recognizers[C]//Proc of the 45th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 2020: 6599−6603
    [25]
    Wang Hongji, Dinkel H, Wang Shuai, et al. Dual-adversarial domain adaptation for generalized replay attack detection[C]//Proc of the 21st Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2020: 1086–1090
    [26]
    Haykin S. Adaptive Filter Theory[M]. London: Pearson, 2014
    [27]
    Singhal S, Wu L. Training feed-forward networks with the extended Kalman algorithm[C]//Proc of the 47th IEEE Int Conf on Acoustics, Speech and Signal Processing (ICASSP). Piscataway, NJ: IEEE, 1989: 1187–1190
    [28]
    Shah S, Palmieri F, Datum M. Optimal filtering algorithms for fast learning in feedforward neural networks[J]. Neural Network, 1992, 5(5): 779−787 doi: 10.1016/S0893-6080(05)80139-X
    [29]
    Ma Haoxin, Yi Jiangyan, Tao Jianhua, et al. Continual learning for fake audio detection[C]//Proc of the 22nd Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2021: 886−890
    [30]
    Veaux C, Yamagishi J, MacDonald K, et al. CSTR VCTK corpus: English multi-speaker corpus for CSTR voice cloning toolkit[EB/OL]. (2019-11-13)[2023-04-10]. https://datashare.ed.ac.uk/handle/10283/3443
    [31]
    Wang Xin, Yamagishi J. Investigating self-supervised front ends for speech spoofing countermeasures[J]. arXiv preprint, arXiv: 2111.07725, 2021
    [32]
    Baevski A, Zhou Yuhao, Mohamed A, et al. wav2vec 2.0: A framework for self-supervised learning of speech representations[C]//Proc of the 34th Annual Conf on Neural Information Processing Systems (NIPS). Cambridge, MA: MIT, 2020: 12449−12460
    [33]
    Conneau A, Baevski A, Collobert R, et al. Unsupervised cross-lingual representation learning for speech recognition[C]//Proc of the 22nd Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2021: 2426−2430
    [34]
    Muller N, Czempin P, Dieckmann F, et al. Does audio deepfake detection generalize[C]//Proc of the 23rd Annual Conf of the Int Speech Communication Association. Grenoble, France: ISCA, 2022: 2783−2787
    [35]
    Zhao Yi, Wen-Chin H, Tian Xiaohai, et al. Voice conversion challenge 2020: Intra-lingual semi-parallel and cross-lingual voice conversion[J]. arXiv preprint, arXiv: 2008.12527, 2020
  • Related Articles

    [1]Zhang Qiang, Ye Ayong, Ye Guohua, Deng Huina, Chen Aimin. k-Anonymous Data Privacy Protection Mechanism Based on Optimal Clustering[J]. Journal of Computer Research and Development, 2022, 59(7): 1625-1635. DOI: 10.7544/issn1000-1239.20210117
    [2]Fu Yao, Li Qingdan, Zhang Zehui, Gao Tiegang. Data Integrity Verification Scheme for Privacy Protection and Fair Payment[J]. Journal of Computer Research and Development, 2022, 59(6): 1343-1355. DOI: 10.7544/issn1000-1239.20210023
    [3]Zhang Shaobo, Wang Guojun, Liu Qin, Liu Jianxun. Trajectory Privacy Protection Method Based on Multi-Anonymizer[J]. Journal of Computer Research and Development, 2019, 56(3): 576-584. DOI: 10.7544/issn1000-1239.2019.20180033
    [4]Wang Ziyu, Liu Jianwei, Zhang Zongyang, Yu Hui. Full Anonymous Blockchain Based on Aggregate Signature and Confidential Transaction[J]. Journal of Computer Research and Development, 2018, 55(10): 2185-2198. DOI: 10.7544/issn1000-1239.2018.20180430
    [5]Jiang Huowen, Zeng Guosun, Hu Kekun. A Graph-Clustering Anonymity Method Implemented by Genetic Algorithm for Privacy-Preserving[J]. Journal of Computer Research and Development, 2016, 53(10): 2354-2364. DOI: 10.7544/issn1000-1239.2016.20160435
    [6]Dai Hua, Yang Geng, Xiao Fu, Zhou Qiang, He Ruiliang. An Energy-Efficient and Privacy-Preserving Range Query Processing in Two-Tiered Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2015, 52(4): 983-993. DOI: 10.7544/issn1000-1239.2015.20140066
    [7]Chen Wei, Xu Ruomei, Li Yuling. A Privacy-Preserving Integrity-Verification-Based Top-k Query Processing[J]. Journal of Computer Research and Development, 2014, 51(12): 2585-2592. DOI: 10.7544/issn1000-1239.2014.20140666
    [8]Dai Hua, Yang Geng, Qin Xiaolin, Liu Liang. Privacy-Preserving Top-k Query Processing in Two-Tiered Wireless Sensor Networks[J]. Journal of Computer Research and Development, 2013, 50(6): 1239-1252.
    [9]Xu Yong, Qin Xiaolin, Yang Yitao, Yang Zhongxue, Huang Can. A QI Weight-Aware Approach to Privacy Preserving Publishing Data Set[J]. Journal of Computer Research and Development, 2012, 49(5): 913-924.
    [10]Liu Yubao, Huang Zhilan, Ada Wai Chee Fu, Yin Jian. A Data Privacy Preservation Method Based on Lossy Decomposition[J]. Journal of Computer Research and Development, 2009, 46(7): 1217-1225.

Catalog

    Article views (197) PDF downloads (91) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return