面向非独立同分布数据的车联网多阶段联邦学习机制

唐晓岚; 梁煜婷; 陈文龙

doi:10.7544/issn1000-1239.202330885

面向非独立同分布数据的车联网多阶段联邦学习机制

Multi-Stage Federated Learning with non-IID Data in Internet of Vehicles

摘要

摘要: 车联网在智慧城市建设中扮演着不可或缺的角色，汽车不仅仅是交通工具，更是大数据时代信息采集和传输的重要载体. 随着车辆采集的数据量飞速增长和人们隐私保护意识的增强，如何在车联网环境中确保用户数据安全，防止数据泄露，成为亟待解决的难题. 联邦学习采用“数据不动模型动”的方式，为保护用户隐私和实现良好性能提供了可行方案. 然而，受限于采集设备、地域环境、个人习惯的差异，多台车辆采集的数据通常表现为非独立同分布（non-independent and identically distributed，non-IID）数据，而传统的联邦学习算法在non-IID数据环境中，其模型收敛速度较慢. 针对这一挑战，提出了一种面向non-IID数据的车联网多阶段联邦学习机制，称为FedWO. 第1阶段采用联邦平均算法，使得全局模型快速达到一个基本的模型准确度. 第2阶段采用联邦加权多方计算，依据各车辆的数据特性计算其在全局模型中的权重，聚合后得到性能更优的全局模型，同时采用传输控制策略，减少模型传输带来的通信开销. 第3阶段为个性化计算阶段，车辆利用各自的数据进行个性化学习，微调本地模型，获得与本地数据更匹配的模型. 实验采用了驾驶行为数据集进行实验评估，结果表明相较于传统方法，在non-IID数据场景下，FedWO机制保护了数据隐私，同时提高了算法的准确度.

Abstract: The Internet of vehicles (IoV) plays an indispensable role in the construction of smart cities, where cars are not just a means of transportation but also a crucial medium for information collection and transmission in the era of big data. With the rapid growth in the volume of data collected from vehicles and the increased awareness of privacy protection, ensuring users’ data security and preventing data breaches in IoV has become an urgent issue to address. Federated learning, as a ‘data-does-not-move, model-moves’ approach, offers a feasible method for protecting user privacy while achieving excellent performance. However, because of the differences of devices, regions and individual habits, data collected from multiple vehicles typically exhibits non-independent and identically distributed (non-IID) characteristics. Traditional federated learning algorithms have slow model convergence when processing non-IID data. In response to this challenge, this paper proposes a multi-stage federated learning algorithm with non-IID data in IoV, named FedWO. In Stage 1, FedWO utilizes the federated averaging algorithm to expedite the global model in reaching a basic level of accuracy. In Stage 2, it employs weighted federated learning, where the weight of a vehicle in the global model is calculated based on its data characteristics. This aggregation results in an improved global model. Moreover, we design a transmission control strategy to reduce communication overhead caused by model transmission. The Stage 3 involves personalized computation, where each vehicle employs its own data for personalized learning, fine-tuning the local model to obtain a model more aligned with local data. We conducted experimental evaluations using a driving behavior dataset. The results demonstrate that, compared with traditional methods, FedWO preserves data privacy while improving the accuracy of algorithms in non-IID data scenarios.

HTML全文

参考文献(36)

施引文献

资源附件(0)