摘要:
动量方法由于能够改善SGD(stochastic gradient descent)的收敛性能而倍受机器学习研究者的关注.随着其在深度学习的成功应用,动量方法出现了众多形式的变体.特别地,产生了SUM(stochastic unified momentum)和QHM(quasi-hyperbolic momentum)两种统一框架.但是,即使是对非光滑凸优化问题,其最优平均收敛性的获得仍然存在着固定迭代步数和无约束等不合理限制.为此,提出了一种更一般的含三参数的统一化动量方法TPUM(triple-parameters unified momentum),能够同时包含SUM和QHM;其次,针对约束的非光滑凸优化问题,在采取时变步长的条件下,证明了所提出的TPUM具有最优的平均收敛速率,并将其推广到随机情况,从而保证了添加动量不会影响标准梯度下降法的收敛性能以及动量方法对机器学习问题的可应用性.典型的L1范数约束hinge损失函数优化问题实验验证了理论分析的正确性.
Abstract:
Momentum methods have been receiving much attention in machine learning community due to being able to improve the performance of SGD. With the successful application in deep learning, various kinds of formulations for momentum methods have been presented. In particular, two unified frameworks SUM (stochastic unified momentum) and QHM (quasi-hyperbolic momentum) were proposed. Unfortunately, even for nonsmooth convex problems, there still exist several unreasonable limitations such as assuming the performed number of iterations to be predefined and restricting the optimization problems to be unconstrained in deriving the optimal average convergence. In this paper, we present a more general framework for momentum methods with three parameters named TPUM (triple-parameters unified momentum), which includes SUM and QHM as specific examples. Then for constrained nonsmooth convex optimization problems, under the circumstances of using time-varying step size, we prove that TPUM has optimal average convergence. This indicates that adding the momentum will not affect the convergence of SGD and it provides a theoretical guarantee for applicability of momentum methods in machine learning problems. The experiments on L1-ball constrained hinge loss problems verify the correctness of theoretical analysis.