面向车联网数据持续共享的安全高效联邦学习

乐俊青; 谭州勇; 张迪; 刘高; 向涛; 廖晓峰

doi:10.7544/issn1000-1239.202330894

摘要: 车联网与人工智能结合推动了自动驾驶汽车的快速发展. 分散于不同车辆中的车联网数据共享并用于训练人工智能模型可实现更高效、更可靠的智能驾驶服务. 自动驾驶汽车可通过车载摄像头、传感器等持续采集车辆实时信息、道路图像和视频等车联网数据，并用于优化更新智能交通模型，弥补车联网数据变化导致的模型准确度下降问题. 提出面向车联网环境下数据持续共享的高效安全联邦学习方案SEFL，以解决车联网数据采集低效、数据动态更新导致的灾难性遗忘、模型训练参数导致的隐私泄露等问题. 在方案SEFL中，车辆基于全局模型，只采集模型识别率较低的车联网数据，并以最大概率对应的输出作为该样本的标签，完成训练样本自动采集. 由于车辆存储空间有限，采集的新样本会覆盖旧样本，导致车辆上数据是动态变化的，传统微调训练方式容易引起灾难性遗忘问题. 为此，方案中设计了一种基于双重知识蒸馏的训练算法，确保模型学习到每个样本的知识，使模型保持较高的准确度. 此外，为了防止车辆与服务器之间传播的模型参数泄露用户隐私，提出了一种自适应的差分隐私策略来实现客户端级的强隐私保护，同时该方案能最大限度地减少差分隐私噪声对全局模型准确度的负面影响. 最后，进行了安全性分析并结合交通标志数据集GTSRB和车辆识别数据集对SEFL方案进行了性能评估. 实验结果表明所提出的SEFL方案能提供可靠的强隐私保护和高效的采集策略，并且在模型准确度方面要优于现有基于联邦学习的算法.

Abstract: The combination of the Internet of vehicles (IoV) and artificial intelligence (AI) has driven the rapid development of autonomous vehicles. Sharing IoV data distributed across different vehicles for training AI models enables more efficient and reliable intelligent driving services. Autonomous vehicles can continuously gather real-time vehicle information, road images and videos among other IoV data, through onboard cameras and sensors. This data are then utilized to optimize and update intelligent traffic models, addressing issues where changes in IoV data result in decreased model accuracy. We propose an efficient and secure federated learning scheme (named as SEFL) for continuous data sharing in an IoV environment to address the problems related to inefficient data collection, catastrophic forgetting problems due to dynamic data updates and privacy leakage from model training parameters. In SEFL, to enable the automatic collection of training samples, each vehicle is based on the global model to only collect IoV data with lower recognition accuracy, and the output with the highest probability is used as the label for that sample. Since vehicle storage space is limited and new samples can overwrite old ones, the data on vehicles are dynamically changing, making traditional fine-tuning training methods prone to catastrophic forgetting. Thus, a dual-knowledge distillation-based training algorithm is proposed in SEFL to ensure that the model learns the knowledge of each sample, maintaining high accuracy. Besides, to prevent privacy leakage from the model parameters between vehicles and servers, an adaptive differential privacy strategy is proposed to achieve client-level privacy protection. Simultaneously, this strategy minimizes the negative impact of differential privacy noise on the accuracy of the global model. Finally, a security analysis and performance evaluation of SEFL scheme are conducted using the GTSRB dataset and vehicle identification dataset. The analysis and experimental results indicate that the proposed SEFL scheme can provide strong privacy protection and efficient data collection. Furthermore, SEFL outperforms existing federated learning-based algorithms in terms of model accuracy.

面向车联网数据持续共享的安全高效联邦学习

Secure and Efficient Federated Learning for Continuous IoV Data Sharing