Abstract:
The combination of the Internet of vehicles (IoV) and artificial intelligence (AI) has driven the rapid development of autonomous vehicles. Sharing IoV data distributed across different vehicles for training AI models enables more efficient and reliable intelligent driving services. Autonomous vehicles can continuously gather real-time vehicle information, road images and videos among other IoV data, through onboard cameras and sensors. This data are then utilized to optimize and update intelligent traffic models, addressing issues where changes in IoV data result in decreased model accuracy. We propose an efficient and secure federated learning scheme (named as SEFL) for continuous data sharing in an IoV environment to address the problems related to inefficient data collection, catastrophic forgetting problems due to dynamic data updates and privacy leakage from model training parameters. In SEFL, to enable the automatic collection of training samples, each vehicle is based on the global model to only collect IoV data with lower recognition accuracy, and the output with the highest probability is used as the label for that sample. Since vehicle storage space is limited and new samples can overwrite old ones, the data on vehicles are dynamically changing, making traditional fine-tuning training methods prone to catastrophic forgetting. Thus, a dual-knowledge distillation-based training algorithm is proposed in SEFL to ensure that the model learns the knowledge of each sample, maintaining high accuracy. Besides, to prevent privacy leakage from the model parameters between vehicles and servers, an adaptive differential privacy strategy is proposed to achieve client-level privacy protection. Simultaneously, this strategy minimizes the negative impact of differential privacy noise on the accuracy of the global model. Finally, a security analysis and performance evaluation of SEFL scheme are conducted using the GTSRB dataset and vehicle identification dataset. The analysis and experimental results indicate that the proposed SEFL scheme can provide strong privacy protection and efficient data collection. Furthermore, SEFL outperforms existing federated learning-based algorithms in terms of model accuracy.