Abstract:
For practical application problems, data size and distribution density are always imbalanced. Because the probabilities of samples falling into various regions are different due to the influence of data size or density distribution, the hyperplane obtained by traditional support vector machine (SVM) based on maximum margin maybe not optimal. Combined with granular computing (GrC) theory, an improved granular support vector machine (GSVM) model based on granule shift parameter, namely S_GSVM, is presented to solve the imbalanced data classification problems. For S_GSVM model, the original data will be firstly mapped into a high-dimensional feature space by Mercer kernel, and then the mapped data will be granulated in this space. Two granule factors, support and disperse, are defined to measure the influence of sample distributions on the performance of SVM. Then, the shift parameter of each granule is computed by support and disperse. Based on these shift parameters, a new convex quadratic optimization problem is constructed and solved. Fully considering the influence of data distribution on the generalization performance, the proposed S_GSVM model can improve the obtained hyperplane which is based on maximum margin. Experiment results on benchmark datasets and database of interacting proteins demonstrate the effectiveness and efficiency of the proposed S_GSVM model.