高级检索
    孔 康, 陶 卿, 汪群山, 储德军. 基于次梯度的L1正则化Hinge损失问题求解研究[J]. 计算机研究与发展, 2012, 49(7): 1494-1499.
    引用本文: 孔 康, 陶 卿, 汪群山, 储德军. 基于次梯度的L1正则化Hinge损失问题求解研究[J]. 计算机研究与发展, 2012, 49(7): 1494-1499.
    Kong Kang, Tao Qing, Wang Qunshan, Chu Dejun. A Sub-Gadient Based Solver for L1-Rgularization+Hinge-Loss Problem[J]. Journal of Computer Research and Development, 2012, 49(7): 1494-1499.
    Citation: Kong Kang, Tao Qing, Wang Qunshan, Chu Dejun. A Sub-Gadient Based Solver for L1-Rgularization+Hinge-Loss Problem[J]. Journal of Computer Research and Development, 2012, 49(7): 1494-1499.

    基于次梯度的L1正则化Hinge损失问题求解研究

    A Sub-Gadient Based Solver for L1-Rgularization+Hinge-Loss Problem

    • 摘要: Hinge损失函数是支持向量机(support vector machines, SVM)成功的关键,L1正则化在稀疏学习的研究中起关键作用.鉴于两者均是不可导函数,高阶梯度信息无法使用.利用随机次梯度方法系统研究L1正则化项的Hinge损失大规模数据问题求解.首先描述了直接次梯度方法和投影次梯度方法的随机算法形式,并对算法的收敛性和收敛速度进行了理论分析.大规模真实数据集上的实验表明,投影次梯度方法对于处理大规模稀疏数据具有更快的收敛速度和更好的稀疏性.实验进一步阐明了投影阈值对算法稀疏度的影响.

       

      Abstract: Hinge loss is central to the success of support vector machines (SVM) in the area of machine learning. L1 regularization plays a crucial role in sparse learning, which is essentially important for large scale classification problems. However, both hinge loss and L1 regularization are non-differentiable, and higher order of gradient information is unavailable. In this paper, the optimization problem in the form of L1 regularization plus hinge loss is systematically investigated by using the sub-gradient method. We first describe algorithms for the direct sub-gradient method and the projected sub-gradient method in a stochastic setting. To confirm the algorithms' correctness, we conduct convergence analysis as well as convergence rate of the stochastic projected sub-gradient method. Experimental results on large scale text classification data demonstrate that the stochastic projected sub-gradient method has better convergence rate and high sparsity, and many of the elements in the weight vector are zero, when processing large scale sparse problems. Further, we also demonstrate how the project threshold affects the algorithms' sparsity.

       

    /

    返回文章
    返回