ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (9): 1901-1910.doi: 10.7544/issn1000-1239.2014.20140161

所属专题: 2014深度学习

• 人工智能 • 上一篇    下一篇

一种具有O(1/T)收敛速率的稀疏随机算法

姜纪远, 夏良, 章显, 陶卿   

  1. (中国人民解放军陆军军官学院 合肥 230031) (jyjianggle@gmail.com)
  • 出版日期: 2014-09-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61273296,60975040);安徽省自然科学基金项目(1308085QF121)

A Sparse Stochastic Algorithm with O(1/T) Convergence Rate

Jiang Jiyuan, Xia Liang, Zhang Xian, and Tao Qing   

  1. (Army Officer Academy of PLA, Hefei 230031)
  • Online: 2014-09-01

摘要: 随机梯度下降(stochastic gradient descent, SGD)是一种求解大规模优化问题的简单高效方法,近期的研究表明,在求解强凸优化问题时其收敛速率可通过α-suffix平均技巧得到有效的提升.但SGD属于黑箱方法,难以得到正则化优化问题所期望的实际结构效果.另一方面,COMID(composite objective mirror descent)是一种能保证L1正则化结构的稀疏随机算法,但对于强凸优化问题其收敛速率仅为O(logT/T).主要考虑“L1+Hinge”优化问题,首先引入L2强凸项将其转化为强凸优化问题,进而将COMID算法和α-suffix平均技巧结合得到L1MD-α算法.证明了L1MD-α具有O(1/T)的收敛速率,并且获得了比COMID更好的稀疏性.大规模数据库上的实验验证了理论分析的正确性和所提算法的有效性.

关键词: 机器学习, 随机优化, 稀疏性, L1正则化, COMID

Abstract: Stochastic gradient descent (SGD) is a simple but efficient method for large-scale optimization problems. Recent researches have shown that its convergence rate can be effectively improved by using the so-called α-suffix averaging technique in solving the strongly convex problems. However, SGD is purely a black-box method, so it is difficult to obtain the expected effect on the learning structure while solving the regularized optimization problems. On the other hand, composite objective mirror descent (COMID) in the stochastic setting is a scalable algorithm which can effectively keep the sparsity imposed by L1 regularization;But it can only obtain an O(logT/T) convergence rate for the strongly convex optimization problems. In this paper, we focus on the generally convex optimization problem in the form of “L1 + Hinge”. To make full use of the α-suffix averaging technique, we first change it into a strongly convex optimization problem by adding an L2 strongly convex term. Then, we present an L1MD-α algorithm which combines the COMID algorithm and the α-suffix averaging technique. We prove that the L1MD-α algorithm can achieve an O(1/T) convergence rate. As a result, our L1MD-α algorithm not only obtains faster convergence rate but also better sparsity than COMID. Through extensive experiments on some typical large-scale datasets we finally verify the correctness of the theoretical analysis and effectiveness of the proposed algorithm.

Key words: machine learning, stochastic optimization, sparsity, L1 regularization, COMID (composite objective mirror descent)

中图分类号: