ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (7): 1563-1572.doi: 10.7544/issn1000-1239.2021.20200018

• 网络技术 • 上一篇    



  1. (解放军战略支援部队信息工程大学 郑州 450002) (
  • 出版日期: 2021-07-01
  • 基金资助: 
    国家重点研发计划项目(2020YFB1804803);国家自然科学基金项目(62002382,61702547,61872382); 广东省重点领域研发计划项目(2018B010113001)

Pinning Control-Based Routing Policy Generation Using Deep Reinforcement Learning

Sun Penghao, Lan Julong, Shen Juan, Hu Yuxiang   

  1. (PLA Strategic Support Force Information Engineering University, Zhengzhou 450002)
  • Online: 2021-07-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2020YFB1804803), the National Natural Science Foundation of China (62002382, 61702547, 61872382), and the Key Research and Development Project of Guangdong Province (2018B010113001).

摘要: 当前网络规模的高速增长带来网络流量复杂度的日益提高,增加了对流量特征精确建模的难度.近年来业界提出使用深度强化学习技术实现网络路由的智能化生成,一定程度上克服了人工进行流量分析和建模的缺点.然而,目前提出的解决方案普遍存在可扩展性差等问题.对此,提出了一种基于牵引控制理论的深度强化学习路由策略生成技术Hierar-DRL,通过引入牵引控制理论并结合深度强化学习的自动策略搜索能力,提高了智能路由算法可扩展性.仿真实验结果表明:所提方案相比当前最优方案的端到端时延最多降低了28.5%,证明了所提智能路由方案的有效性.

关键词: 路由优化, 软件定义网络, 人工智能, 深度强化学习, 牵引控制

Abstract: Computer networks have been playing an important role in modern society. The rapid growth of the network scale makes the network traffic more and more complicated, which is hard to accurately model. This condition makes the optimal routing policy in communication networks an NP-hard problem. To solve this problem, traditional methods for routing and traffic engineering mainly use hand-crafted algorithms, which cannot ensure both the accuracy and efficiency. In recent years, deep reinforcement learning (DRL)-based network routing strategies have been proposed, which overcome the shortcomings of manually analysis and modelling by human experts to some extent. However, current DRL-based routing strategies all have problems in scalability, which means they cannot be used in large scale networks. Under this circumstance, this paper proposes Hierar-DRL, a DRL-based network routing technology that employs pinning control theory. Pinning control helps Hierar-DRL to select a subset of network nodes as the target control nodes of DRL. With the advantages of pinning control and the automatic policy exploring ability of DRL, Hierar-DRL shows better scalability in large networks. Simulation results show that the proposed scheme can reduce the average end-to-end transmission delay in the test network topologies by up to 28.5% compared with the state-of-the-art, which validates the proposed scheme.

Key words: routing optimization, software-defined networking, artificial intelligence, deep reinforce-ment learning, pinning control