ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2020, Vol. 57 ›› Issue (8): 1571-1580.doi: 10.7544/issn1000-1239.2020.20200194

Special Issue: 2020数据挖掘与知识发现专题

Previous Articles     Next Articles

A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate

Ding Chengcheng1, Tao Wei2, Tao Qing1   

  1. 1(Department of Information Engineering, Army Academy of Artillery and Air Defense of PLA, Hefei 230031);2(College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007)
  • Online:2020-08-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61673394) and the Natural Science Foundation of Anhui Province (1908085MF193).

Abstract: Momentum methods have been receiving much attention in machine learning community due to being able to improve the performance of SGD. With the successful application in deep learning, various kinds of formulations for momentum methods have been presented. In particular, two unified frameworks SUM (stochastic unified momentum) and QHM (quasi-hyperbolic momentum) were proposed. Unfortunately, even for nonsmooth convex problems, there still exist several unreasonable limitations such as assuming the performed number of iterations to be predefined and restricting the optimization problems to be unconstrained in deriving the optimal average convergence. In this paper, we present a more general framework for momentum methods with three parameters named TPUM (triple-parameters unified momentum), which includes SUM and QHM as specific examples. Then for constrained nonsmooth convex optimization problems, under the circumstances of using time-varying step size, we prove that TPUM has optimal average convergence. This indicates that adding the momentum will not affect the convergence of SGD and it provides a theoretical guarantee for applicability of momentum methods in machine learning problems. The experiments on L1-ball constrained hinge loss problems verify the correctness of theoretical analysis.

Key words: machine learning, optimization algorithm, non-smooth condition, momentum methods, average convergence rate

CLC Number: