A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate

Ding Chengcheng; Tao Wei; Tao Qing

doi:10.7544/issn1000-1239.2020.20200194

Journal of Computer Research and Development > 2020 > 57(8): 1571-1580. > DOI: 10.7544/issn1000-1239.2020.20200194

Ding Chengcheng, Tao Wei, Tao Qing. A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate[J]. Journal of Computer Research and Development, 2020, 57(8): 1571-1580. DOI: 10.7544/issn1000-1239.2020.20200194

Citation:

PDF (2703 KB)

A Unified Momentum Method with Triple-Parameters and Its Optimal Convergence Rate

¹(Department of Information Engineering, Army Academy of Artillery and Air Defense of PLA, Hefei 230031)
²(College of Command and Control Engineering, Army Engineering University of PLA, Nanjing 210007)

Funds: This work was supported by the National Natural Science Foundation of China (61673394) and the Natural Science Foundation of Anhui Province (1908085MF193).

More Information

Published Date: July 31, 2020

Graphical Abstract

Abstract

Abstract

Momentum methods have been receiving much attention in machine learning community due to being able to improve the performance of SGD. With the successful application in deep learning, various kinds of formulations for momentum methods have been presented. In particular, two unified frameworks SUM (stochastic unified momentum) and QHM (quasi-hyperbolic momentum) were proposed. Unfortunately, even for nonsmooth convex problems, there still exist several unreasonable limitations such as assuming the performed number of iterations to be predefined and restricting the optimization problems to be unconstrained in deriving the optimal average convergence. In this paper, we present a more general framework for momentum methods with three parameters named TPUM (triple-parameters unified momentum), which includes SUM and QHM as specific examples. Then for constrained nonsmooth convex optimization problems, under the circumstances of using time-varying step size, we prove that TPUM has optimal average convergence. This indicates that adding the momentum will not affect the convergence of SGD and it provides a theoretical guarantee for applicability of momentum methods in machine learning problems. The experiments on L1-ball constrained hinge loss problems verify the correctness of theoretical analysis.
- machine learning,
- optimization algorithm,
- non-smooth condition,
- momentum methods,
- average convergence rate

FullText(HTML)

References (0)

[1]	Chen Yewang, Shen Lianlian, Zhong Caiming, Wang Tian, Chen Yi, Du Jixiang. Survey on Density Peak Clustering Algorithm[J]. Journal of Computer Research and Development, 2020, 57(2): 378-394. DOI: 10.7544/issn1000-1239.2020.20190104
[2]	Zhao Huihui, Zhao Fan, Chen Renhai, Feng Zhiyong. Efficient Index and Query Algorithm Based on Geospatial Big Data[J]. Journal of Computer Research and Development, 2020, 57(2): 333-345. DOI: 10.7544/issn1000-1239.2020.20190565
[3]	Xu Zhengguo, Zheng Hui, He Liang, Yao Jiaqi. Self-Adaptive Clustering Based on Local Density by Descending Search[J]. Journal of Computer Research and Development, 2016, 53(8): 1719-1728. DOI: 10.7544/issn1000-1239.2016.20160136
[4]	Gong Shufeng, Zhang Yanfeng. EDDPC: An Efficient Distributed Density Peaks Clustering Algorithm[J]. Journal of Computer Research and Development, 2016, 53(6): 1400-1409. DOI: 10.7544/issn1000-1239.2016.20150616
[5]	Meng Xiaofeng, Zhang Xiaojian. Big Data Privacy Management[J]. Journal of Computer Research and Development, 2015, 52(2): 265-281. DOI: 10.7544/issn1000-1239.2015.20140073
[6]	Liu Yahui, Zhang Tieying, Jin Xiaolong, Cheng Xueqi. Personal Privacy Protection in the Era of Big Data[J]. Journal of Computer Research and Development, 2015, 52(1): 229-247. DOI: 10.7544/issn1000-1239.2015.20131340
[7]	Liu Zhuo, Yang Yue, Zhang Jianpei, Yang Jing, Chu Yan, Zhang Zebao. An Adaptive Grid-Density Based Data Stream Clustering Algorithm Based on Uncertainty Model[J]. Journal of Computer Research and Development, 2014, 51(11): 2518-2527. DOI: 10.7544/issn1000-1239.2014.20130869
[8]	Xu Min, Deng Zhaohong, Wang Shitong, Shi Yingzhong. MMCKDE: m-Mixed Clustering Kernel Density Estimation over Data Streams[J]. Journal of Computer Research and Development, 2014, 51(10): 2277-2294. DOI: 10.7544/issn1000-1239.2014.20130718
[9]	Wang Ning, Li Jie. Two-Tiered Correlation Clustering Method for Entity Resolution in Big Data[J]. Journal of Computer Research and Development, 2014, 51(9): 2108-2116. DOI: 10.7544/issn1000-1239.2014.20131345
[10]	Xie Kunwu, Bi Xiaoling, and Ye Bin. Clustering Algorithm of High-Dimensional Data Based on Units[J]. Journal of Computer Research and Development, 2007, 44(9): 1618-1623.

Cited By

Cited by

Periodical cited type(5)

1.	丁强龙，叶惠珠，袁弘强，李志新. 大规模时空轨迹数据连接查询效率优化实践. 计算机系统应用. 2024(05): 1-14 .
2.	于平. 融合改进DBSCAN聚类和多种进化策略的改进蝗虫优化算法. 仪表技术与传感器. 2024(05): 98-105+112 .
3.	王赟. 通信大数据安全监管平台的设计与实践. 湖南邮电职业技术学院学报. 2024(03): 8-13+19 .
4.	李杰，李蓝青，曹帅，戴上. 基于改进灰狼算法优化和极限学习机的电网电力负荷预测. 微型电脑应用. 2024(11): 75-77+82 .
5.	武晓朦，袁榕泽，李英量，朱琦. 基于新冠病毒群体免疫算法的有源配电网优化调度. 系统仿真学报. 2023(12): 2692-2702 .