ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (12): 2783-2797.doi: 10.7544/issn1000-1239.2021.20200366

• 网络技术 • 上一篇    下一篇

基于深度强化学习的自适应虚拟机整合方法

余显1,2,李振宇1,孙胜1,2,张广兴1,刁祖龙1,谢高岗1   

  1. 1(中国科学院计算技术研究所 北京 100190);2(中国科学院大学 北京 100049) (yuxian@ict.ac.cn)
  • 出版日期: 2021-12-01
  • 基金资助: 
    国家自然科学基金项目(61725206,U20A20180);中科院奥地利合作项目(171111KYSB20200001)

Adaptive Virtual Machine Consolidation Method Based on Deep Reinforcement Learning

Yu Xian1,2, Li Zhenyu1, Sun Sheng1,2, Zhang Guangxing1, Diao Zulong1, Xie Gaogang1   

  1. 1(Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049)
  • Online: 2021-12-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61725206, U20A20180), and the CAS-Austria Project Plan (171111KYSB20200001).

摘要: 能耗限制的服务质量优化问题一直以来都是数据中心虚拟机资源管理所面临的巨大挑战之一.尽管现有的工作通过虚拟机整合技术一定程度上降低了能耗和提升了系统服务质量,但这些方法通常难以实现长期最优的管理目标,并且容易受到业务场景变化的影响,面临变更困难以及管理成本高等难题.针对数据中心虚拟机资源管理存在的能耗和服务质量长期最优难保证以及策略调整灵活性差的问题,提出了一种基于深度强化学习的自适应虚拟机整合方法(deep reinforcement learning-based adaptive virtual machine consolidation method, RA-VMC).该方法利用张量化状态表示、确定性动作输出、卷积神经网络和加权奖赏机制构建了从数据中心系统状态到虚拟机迁移策略的端到端决策模型;设计自动化状态生成机制和反向梯度限定机制以改进深度确定性策略梯度算法,加快虚拟机迁移决策模型的收敛速度并且保证近似最优的管理性能.基于真实虚拟机负载数据的仿真实验结果表明:与开源云平台中流行的虚拟机整合方法相比,该方法能够有效地降低能耗和提高系统的服务质量.

关键词: 数据中心, 虚拟机资源管理, 虚拟机整合, 强化学习, 深度确定性策略梯度

Abstract: The problem of service quality optimization with energy consumption restriction has always been one of the big challenges for virtual machine (VM) resource management in data centers. Although existing work has reduced energy consumption and improved system service quality to a certain extent through VM consolidation technology, these methods are usually difficult to achieve long-term optimal management goals. Moreover, their performance is susceptible to the change of application scenarios, such that they are difficult to be replaced and will produce much management cost. In view of the problem that VM resource management in data center is hard to achieve long-term optimal energy efficiency and service quality, and also has poor flexibility in policy adjustment, this paper proposes an adaptive VM consolidation method based on deep reinforcement learning. This method builds an end-to-end decision-making model from data center system state to VM migration strategy through state tensor representation, deterministic action output, convolution neural network and weighted reward mechanism; It also designs an automatic state generation mechanism and an inverting gradient limitation mechanism to improve deep deterministic strategy gradient algorithm, speed up the convergence speed of VM migration decision-making model, and guarantee the approximately optimal management performance. Simulation experiment results based on real VM load data show that compared with popular VM consolidation methods in open source cloud platforms, this method can effectively reduce energy consumption and improve system service quality.

Key words: data center, VM resource management, VM consolidation, reinforcement learning, deep deterministic policy gradient (DDPG)

中图分类号: