Ding Chengcheng, Tao Wei, Tao Qing. A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate[J]. Journal of Computer Research and Development, 2020, 57(8): 1571-1580. DOI: 10.7544/issn1000-1239.2020.20200194
Citation:
Ding Chengcheng, Tao Wei, Tao Qing. A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate[J]. Journal of Computer Research and Development, 2020, 57(8): 1571-1580. DOI: 10.7544/issn1000-1239.2020.20200194
Ding Chengcheng, Tao Wei, Tao Qing. A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate[J]. Journal of Computer Research and Development, 2020, 57(8): 1571-1580. DOI: 10.7544/issn1000-1239.2020.20200194
Citation:
Ding Chengcheng, Tao Wei, Tao Qing. A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate[J]. Journal of Computer Research and Development, 2020, 57(8): 1571-1580. DOI: 10.7544/issn1000-1239.2020.20200194
1(Department of Information Engineering, Army Academy of Artillery and Air Defense of PLA, Hefei 230031)
2(College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007)
Funds: This work was supported by the National Natural Science Foundation of China (61673394) and the Natural Science Foundation of Anhui Province (1908085MF193).
Momentum methods have been receiving much attention in machine learning community due to being able to improve the performance of SGD. With the successful application in deep learning, various kinds of formulations for momentum methods have been presented. In particular, two unified frameworks SUM (stochastic unified momentum) and QHM (quasi-hyperbolic momentum) were proposed. Unfortunately, even for nonsmooth convex problems, there still exist several unreasonable limitations such as assuming the performed number of iterations to be predefined and restricting the optimization problems to be unconstrained in deriving the optimal average convergence. In this paper, we present a more general framework for momentum methods with three parameters named TPUM (triple-parameters unified momentum), which includes SUM and QHM as specific examples. Then for constrained nonsmooth convex optimization problems, under the circumstances of using time-varying step size, we prove that TPUM has optimal average convergence. This indicates that adding the momentum will not affect the convergence of SGD and it provides a theoretical guarantee for applicability of momentum methods in machine learning problems. The experiments on L1-ball constrained hinge loss problems verify the correctness of theoretical analysis.