ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2016, Vol. 53 ›› Issue (9): 1971-1978.doi: 10.7544/issn1000-1239.2016.20150489

• 人工智能 • 上一篇    下一篇

随机傅里叶特征空间中高斯核支持向量机模型选择

冯昌,廖士中   

  1. (天津大学计算机科学与技术学院 天津 300350) (changfeng@tju.edu.cn)
  • 出版日期: 2016-09-01
  • 基金资助: 
    国家自然科学基金项目(61170019)

Model Selection for Gaussian Kernel Support Vector Machines in Random Fourier Feature Space

Feng Chang, Liao Shizhong   

  1. (School of Computer Science and Technology, Tianjin University, Tianjin 300350)
  • Online: 2016-09-01

摘要: 模型选择是支持向量机(support vector machines, SVMs)学习的关键问题.标准支持向量机学习本质上是求解一个凸二次优化问题,求解的时间复杂度为数据规模的立方级,而经典的模型选择方法往往需要多次训练支持向量机,这种模型选择方法对于中等规模的支持向量机学习计算代价已较高,更难以扩展到大规模支持向量机学习.基于高斯核函数的随机傅里叶特征近似,提出一种新的、高效的核支持向量机模型选择方法.首先,利用随机傅里叶特征映射,将无限维隐式特征空间嵌入到一个相对低维的显式随机特征空间,并推导在2个不同的特征空间中分别训练支持向量机所得到的模型的误差上界;然后,以模型误差上界为理论保证,提出随机特征空间中核支持向量机的模型选择方法,应用随机特征空间中的线性支持向量机来逼近核支持向量机,计算模型选择准则的近似值,从而评价所对应的核支持向量机的相对优劣;最后,在标杆数据集上验证所提出方法的可行性和高效性.实验结果表明,所提出的模型选择方法与标准交叉验证方法的测试精度基本相当,但可显著地提高核支持向量机模型选择效率.

关键词: 模型选择, 支持向量机, 随机傅里叶特征, 高斯核, 交叉验证

Abstract: Model selection is very critical to support vector machines (SVMs). Standard SVMs typically suffer from cubic time complexity in data size since they solve the convex quadratic programming problems. However, it usually needs to train hundreds/thousands of SVMs for model selection, which is prohibitively time-consuming for medium-scale datasets and very difficult to scale up to large-scale problems. In this paper, by using random Fourier features to approximate Gaussian kernel, a novel and efficient approach to model selection of kernel SVMs is proposed. Firstly, the random Fourier feature mapping is used to embed the infinite-dimensional implicit feature space into an explicit random feature space. An error bound between the accurate model obtained by training kernel SVM and the approximate one returned by the linear SVM in the random feature space is derived. Then, in the random feature space, a model selection approach to kernel SVM is presented. Under the guarantee of the model error upper bound, by applying the linear SVMs in the random feature space to approximate the corresponding kernel SVMs, the approximate model selection criterion can be efficiently calculated and used to assess the relative goodness of the corresponding kernel SVMs. Finally, comparison experiments on benchmark datasets for cross validation model selection show the proposed approach can significantly improve the efficiency of model selection for kernel SVMs while guaranteeing test accuracy.

Key words: model selection, support vector machines (SVMs), random Fourier features, Gaussian kernel, cross validation

中图分类号: