Abstract:
In the field of distributed artificial intelligence, federated learning models, like centrally trained models, are vulnerable to adversarial samples during reasoning. Currently, federated adversarial training has not been extensively explored and is facing two major challenges: 1) There is a trade-off between the model accuracy on clean samples and adversarial robustness, making it difficult to improve both simultaneously; 2) non-independent and identically distributed (Non-IID) data limit the performance improvement of federated adversarial training. To address these challenges, we propose the BTFAT framework, which breaks the trade-off between robustness and accuracy in federated adversarial training. The framework includes: 1) A decision-space tightening algorithm that performs initial intra-class localization using labels, while shrinking intra-class sample distances and increasing inter-class sample distances, thereby improving both robustness and accuracy; 2) A weight penalty optimization algorithm that uses the global model weights as the best unified target, penalizing local adversarial model training that deviates excessively, and assisting the decision-space algorithm to counter the impact of Non-IID data distributions. We theoretically analyze the key factors limiting the robustness and accuracy gains of adversarial training as well as the convergence of BTFAT. Simultaneously, we experimentally demonstrate that BTFAT comprehensively outperforms state-of-the-art baseline algorithms in terms of overall performance, convergence, time cost, and handling Non-IID data, providing a new perspective for research in federated adversarial training. Our code can be found in the website: https://anonymous.4open.science/r/BTFAT-11265.