高级检索

    FedDW:异构联邦学习中基于一致性优化的权重蒸馏方法

    FedDW: Distilling Weights through Consistency Optimization in Heterogeneous Federated Learning

    • 摘要: 联邦学习是一种经典的分布式机器学习范式,允许在不集中数据的前提实现模型协同训练。该方法在保障数据隐私方面具有显著优势,然而由于客户端间数据异质性显著以及联邦规模不断扩大,对训练效率和模型性能提出诸多挑战。已有研究表明,在独立同分布环境下,模型的参数结构通常满足特定一致性关系,这些一致性关系往往在神经网络训练过程的中间结果中得以保留。若在非IID数据下能够识别并正则化上述一致性关系,有助于将参数分布向IID情形对齐,从而缓解数据异质性带来的影响。基于上述思想,本文首先引入深度学习加密数据概念,并基于此提出了一种一致性优化范式,然后,并发掘软标签与分类层权重矩阵之间的一致性关系,据此构建了一个新的联邦学习框架。本文方法在四个公开数据集及多种神经网络模型(包括ResNet、ViT)上开展实验。结果表明,在高度异构的数据设置下,本文方法相较于10种最主流联邦学习方法在平均精度上提升了约3%。此外,本文还从理论上证明了本文方法具备更高的训练效率,其附加的反向传播计算开销可以忽略不计。

       

      Abstract: Federated learning is a classic distributed machine learning paradigm that allows collaborative model training without centralized data. This method has significant advantages in ensuring data privacy. However, due to the significant data heterogeneity between clients and the continuous expansion of the federation scale, it faces many challenges in training efficiency and model performance. Previous studies have shown that in an independent and identically distributed environment, the parameter structure of the model usually satisfies a specific consistency relationship, and these relationships are often preserved in the intermediate results of the neural network training process. If the above consistency relationship can be identified and regularized under non-IID data, it will help to align the parameter distribution to the IID case, thereby alleviating the impact of data heterogeneity. Based on the above ideas, this paper first introduces the concept of deep learning encrypted data, and proposes a consistency optimization paradigm based on it. Then, it explores the consistency relationship between soft labels and classification layer weight matrices, and constructs a federated learning framework based on this. We conduct experiments on four public datasets and multiple neural network models (including ResNet and ViT). The results show that in a highly heterogeneous data setting, the average accuracy of this method is improved by about 3% compared with the 10 most mainstream federated learning methods. In addition, this paper also theoretically proves that the method in this paper has higher training efficiency, and its additional back-propagation computational overhead is negligible.

       

    /

    返回文章
    返回