高级检索

    基于近似高斯核显式描述的大规模SVM求解

    Approximate Gaussian Kernel for Large-Scale SVM

    • 摘要: 大规模数据集上非线性支持向量机(support vector machine, SVM)的求解代价过高,然而对于线性SVM却存在高效求解算法.为了应用线性SVM高效求解算法求解非线性SVM,并保证非线性SVM的精确性,提出一种基于近似高斯核显式描述的大规模SVM求解方法.首先,定义近似高斯核并建立其与高斯核的关系,推导近似高斯核与高斯核的偏差上界.然后给出近似高斯核对应的再生核希尔伯特空间(reproducing kernel Hilbert space, RKHS)的显式描述,由此可精确刻画SVM解的结构,增强SVM方法的可解释性.最后显式地构造近似高斯核对应的特征映射,并将其作为线性SVM的输入,从而实现了用线性SVM算法高效求解大规模非线性SVM.实验结果表明,所提出的方法能提高非线性SVM的求解效率,并得到与标准非线性SVM相近的精确性.

       

      Abstract: Training support vector machine (SVM) with nonlinear kernel functions on large-scale data is usually very time consuming. In contrast, there exist faster solvers to train the linear SVM. To utilize the computational efficiency of linear SVM without sacrificing the accuracy of nonlinear ones, in this paper, we present a method for solving large-scale nonlinear SVM based on an explicit description of an approximate Gaussian kernel. We first give the definition of the approximate Gaussian kernel, and establish the connection between approximate Gaussian kernel and Gaussian kernel, and also derive the error bound between these two kernel functions. Then, we present an explicit description of the reproducing kernel Hilbert space (RKHS) induced by the approximate Gaussian kernel. Thus, we can exactly depict the structure of the solutions of SVM, which can enhance the interpretability of the model and make us more deeply understand this method. Finally, we explicitly construct the feature mapping induced by the approximate Gaussian kernel, and use the mapped data as input of linear SVM. In this way, we can utilize existing efficient linear SVM to solve non-linear SVM on large-scale data. Experimental results show that the proposed method is efficient, and can achieve comparable classification accuracy to a normal nonlinear SVM.

       

    /

    返回文章
    返回