ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2020, Vol. 57 ›› Issue (8): 1571-1580.doi: 10.7544/issn1000-1239.2020.20200194

所属专题: 2020数据挖掘与知识发现专题

• 人工智能 • 上一篇    下一篇



  1. 1(中国人民解放军陆军炮兵防空兵学院信息工程系 合肥 230031);2(中国人民解放军陆军工程大学指挥控制工程学院 南京 210007) (
  • 出版日期: 2020-08-01
  • 基金资助: 

A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate

Ding Chengcheng1, Tao Wei2, Tao Qing1   

  1. 1(Department of Information Engineering, Army Academy of Artillery and Air Defense of PLA, Hefei 230031);2(College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007)
  • Online: 2020-08-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61673394) and the Natural Science Foundation of Anhui Province (1908085MF193).

摘要: 动量方法由于能够改善SGD(stochastic gradient descent)的收敛性能而倍受机器学习研究者的关注.随着其在深度学习的成功应用,动量方法出现了众多形式的变体.特别地,产生了SUM(stochastic unified momentum)和QHM(quasi-hyperbolic momentum)两种统一框架.但是,即使是对非光滑凸优化问题,其最优平均收敛性的获得仍然存在着固定迭代步数和无约束等不合理限制.为此,提出了一种更一般的含三参数的统一化动量方法TPUM(triple-parameters unified momentum),能够同时包含SUM和QHM;其次,针对约束的非光滑凸优化问题,在采取时变步长的条件下,证明了所提出的TPUM具有最优的平均收敛速率,并将其推广到随机情况,从而保证了添加动量不会影响标准梯度下降法的收敛性能以及动量方法对机器学习问题的可应用性.典型的L1范数约束hinge损失函数优化问题实验验证了理论分析的正确性.

关键词: 机器学习, 优化算法, 非光滑条件, 动量方法, 平均收敛速率

Abstract: Momentum methods have been receiving much attention in machine learning community due to being able to improve the performance of SGD. With the successful application in deep learning, various kinds of formulations for momentum methods have been presented. In particular, two unified frameworks SUM (stochastic unified momentum) and QHM (quasi-hyperbolic momentum) were proposed. Unfortunately, even for nonsmooth convex problems, there still exist several unreasonable limitations such as assuming the performed number of iterations to be predefined and restricting the optimization problems to be unconstrained in deriving the optimal average convergence. In this paper, we present a more general framework for momentum methods with three parameters named TPUM (triple-parameters unified momentum), which includes SUM and QHM as specific examples. Then for constrained nonsmooth convex optimization problems, under the circumstances of using time-varying step size, we prove that TPUM has optimal average convergence. This indicates that adding the momentum will not affect the convergence of SGD and it provides a theoretical guarantee for applicability of momentum methods in machine learning problems. The experiments on L1-ball constrained hinge loss problems verify the correctness of theoretical analysis.

Key words: machine learning, optimization algorithm, non-smooth condition, momentum methods, average convergence rate