Abstract:
Currently, deep learning has achieved significant success in the field of synthetic speech detection. However, deep models commonly attain high accuracy on test sets that closely match their training distribution but exhibit a substantial drop in accuracy in cross-dataset scenarios. To enhance the generalization capability of models on new datasets, they are often fine-tuned with new data, but this leads to catastrophic forgetting, where the model's knowledge learned from old data is impaired, resulting in deteriorated performance on the old data. Continuous learning is a prevalent approach to mitigate catastrophic forgetting. In this paper, we propose a continuous learning method called Elastic Orthogonal Weight Modification (EOWM) to address catastrophic forgetting for synthetic speech detection. EOWM mitigates knowledge degradation by adjusting the direction and magnitude of parameter updates when the model learns new knowledge. Specifically, it enforces the updates' direction to be orthogonal to the data distribution of the old tasks while constraining the magnitude of updates for important parameters in the old tasks. Our proposed method demonstrates promising results in cross-dataset experiments within the domain of synthetic speech detection. Compared to fine-tuning, EOWM reduces the Equal Error Rate (EER) on the old dataset from 7.334% to 0.821%, representing a relative improvement of 90%, and on the new dataset, it decreases EER from 0.513% to 0.315%, corresponding to a relative improvement of 40%.