C-V2X环境下基于队友模型的多智能体通信切换优化

刘冰艺; 王东东; 施海勇; 王恩澍; 吴黎兵; 汪建平

doi:10.7544/issn1000-1239.202440404

C-V2X环境下基于队友模型的多智能体通信切换优化

Optimization of Multi-Agent Handover Based on Team Model in C-V2X Environments

摘要

摘要: 蜂窝车联网（cellular vehicle-to-everything，C-V2X）通信技术是未来智能交通系统（intelligent transportation systems，ITS）的重要组成部分. 毫米波（millimeter wave，mmWave）作为C-V2X通信技术的主要载体之一，可以为用户提供高带宽. 然而，由于传播距离有限和对遮挡的敏感性，毫米波基站必须密集部署以维持可靠的通信，这使得智能联网车辆（intelligent connected vehicle，ICV）在行驶过程中不得不频繁地进行通信切换，极易造成局部资源短缺，进而导致服务质量低下和用户体验不佳. 为了应对这些挑战，每辆ICV被视为一个智能体，并将ICV的通信切换问题建模为一个合作型多智能体博弈问题. 为了解决这一问题，提出了一个基于队友模型的合作型强化学习框架. 具体来说，首先设计了一个队友模型，用于量化复杂动态环境下智能体之间的相互依赖关系；接着提出了一种动态权重分配方案，生成了队友间的加权互信息，用于混合网络的输入，旨在帮助队友切换到可以提供良好QoS和QoE的基站，以获得高吞吐量和低通信切换频率. 在算法训练过程中，设计了一种激励相容训练算法，旨在协调智能体的个体目标与集体目标的一致性，提升通信吞吐量. 实验结果显示，提出的方法在不同规模车辆的场景中均展示出了卓越的性能，相较于现有的基于通信基准方法有13.8%~38.2%的吞吐量提升.

Abstract: Cellular vehicle-to-everything (C-V2X) communication technology is a crucial component of future intelligent transportation systems (ITS). Millimeter wave (mmWave), as one of the primary carriers for C-V2X technology, offers high bandwidth to users. However, due to limited propagation distance and sensitivity to obstructions, mmWave base stations must be densely deployed to maintain reliable communication. This requirement causes intelligent connected vehicle (ICV) to frequently switch communications during travel, easily leading to local resource shortages, thus degrading service quality and user experience. To address these challenges, we treat each ICV as an agent and model the ICV communication switching issue as a cooperative multi-agent game problem. To solve this problem, we propose a cooperative reinforcement learning framework based on a teammate model. Specifically, we design a teammate model to quantify the interdependencies among agents in complex dynamic environments. Furthermore, we propose a dynamic weight allocation scheme that generates weighted mutual information among teammates for the input of the mixing network, aiming to assist teammates in switching to base stations that provide satisfactory QoS and QoE, thereby achieving high throughput and low communication switching frequency. During the algorithm training process, we design an incentive-compatible training algorithm aimed at aligning the individual goals of the agents with collective goals, enhancing communication throughput. Experimental results demonstrate that this algorithm achieves a 13.8% to 38.2% increase in throughput compared with existing communication benchmark algorithms.

HTML全文

参考文献(33)

施引文献

资源附件(0)