高级检索

    基于粒度偏移因子的支持向量机学习方法

    A Support Vector Machine Learning Method Based on Granule Shift Parameter

    • 摘要: 在实际应用中,数据集样本规模、分布密度的不平衡性可能会使传统支持向量机(support vector machine, SVM)得到的分类超平面不是最优.在对传统支持向量机最优分类面分析的基础上,结合粒度计算(granular computing, GrC)理论,针对数据规模和分布密度不平衡的数据集,提出一种基于粒度偏移因子的粒度支持向量机(granular SVM, GSVM)学习方法,称为S_GSVM方法.该方法将原始样本用Mercer核映射到高维空间,然后在高维空间中对数据进行有效的粒划分,通过对不同的粒计算不同的超平面偏移因子,重新构造支持向量机的凸二次优化问题,以得到一个泛化能力更好的分类超平面.S_GSVM方法充分考虑了数据复杂分布对于泛化能力的影响,对基于最大间隔的分类面进行改进.实验结果表明,S_GSVM方法在非平衡数据集上能得到较好的泛化性能.

       

      Abstract: For practical application problems, data size and distribution density are always imbalanced. Because the probabilities of samples falling into various regions are different due to the influence of data size or density distribution, the hyperplane obtained by traditional support vector machine (SVM) based on maximum margin maybe not optimal. Combined with granular computing (GrC) theory, an improved granular support vector machine (GSVM) model based on granule shift parameter, namely S_GSVM, is presented to solve the imbalanced data classification problems. For S_GSVM model, the original data will be firstly mapped into a high-dimensional feature space by Mercer kernel, and then the mapped data will be granulated in this space. Two granule factors, support and disperse, are defined to measure the influence of sample distributions on the performance of SVM. Then, the shift parameter of each granule is computed by support and disperse. Based on these shift parameters, a new convex quadratic optimization problem is constructed and solved. Fully considering the influence of data distribution on the generalization performance, the proposed S_GSVM model can improve the obtained hyperplane which is based on maximum margin. Experiment results on benchmark datasets and database of interacting proteins demonstrate the effectiveness and efficiency of the proposed S_GSVM model.

       

    /

    返回文章
    返回