ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (10): 2171-2177.doi: 10.7544/issn1000-1239.2014.20130825

• 人工智能 • 上一篇    下一篇

基于近似高斯核显式描述的大规模SVM求解

刘勇,江沙里,廖士中   

  1. (天津大学计算机科学与技术学院 天津 300072) (szliao@tju.edu.cn)
  • 出版日期: 2014-10-01
  • 基金资助: 
    国家自然科学基金项目(60933005);国家“八六三”高技术研究发展计划基金项目(2011AA010702,2012AA01A401,2012AA01A402)

Approximate Gaussian Kernel for Large-Scale SVM

Liu Yong, Jiang Shali, Liao Shizhong   

  1. (School of Computer Science and Technology, Tianjin University, Tianjin 300072)
  • Online: 2014-10-01

摘要: 大规模数据集上非线性支持向量机(support vector machine, SVM)的求解代价过高,然而对于线性SVM却存在高效求解算法.为了应用线性SVM高效求解算法求解非线性SVM,并保证非线性SVM的精确性,提出一种基于近似高斯核显式描述的大规模SVM求解方法.首先,定义近似高斯核并建立其与高斯核的关系,推导近似高斯核与高斯核的偏差上界.然后给出近似高斯核对应的再生核希尔伯特空间(reproducing kernel Hilbert space, RKHS)的显式描述,由此可精确刻画SVM解的结构,增强SVM方法的可解释性.最后显式地构造近似高斯核对应的特征映射,并将其作为线性SVM的输入,从而实现了用线性SVM算法高效求解大规模非线性SVM.实验结果表明,所提出的方法能提高非线性SVM的求解效率,并得到与标准非线性SVM相近的精确性.

关键词: 支持向量机, 线性支持向量机, 核方法, 近似高斯核, 再生核希尔伯特空间

Abstract: Training support vector machine (SVM) with nonlinear kernel functions on large-scale data is usually very time consuming. In contrast, there exist faster solvers to train the linear SVM. To utilize the computational efficiency of linear SVM without sacrificing the accuracy of nonlinear ones, in this paper, we present a method for solving large-scale nonlinear SVM based on an explicit description of an approximate Gaussian kernel. We first give the definition of the approximate Gaussian kernel, and establish the connection between approximate Gaussian kernel and Gaussian kernel, and also derive the error bound between these two kernel functions. Then, we present an explicit description of the reproducing kernel Hilbert space (RKHS) induced by the approximate Gaussian kernel. Thus, we can exactly depict the structure of the solutions of SVM, which can enhance the interpretability of the model and make us more deeply understand this method. Finally, we explicitly construct the feature mapping induced by the approximate Gaussian kernel, and use the mapped data as input of linear SVM. In this way, we can utilize existing efficient linear SVM to solve non-linear SVM on large-scale data. Experimental results show that the proposed method is efficient, and can achieve comparable classification accuracy to a normal nonlinear SVM.

Key words: support vector machine, linear support vector machine, kernel methods, approximate Gaussian kernel, reproducing kernel Hilbert space

中图分类号: