ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (11): 2361-2371.doi: 10.7544/issn1000-1239.2018.20170514

Previous Articles     Next Articles

A GEV-Based Classification Algorithm for Imbalanced Data

Fu Junjie, Liu Gongshen   

  1. (上海交通大学电子信息与电气工程学院 上海 200240) (tianzhiyinyi@sjtu.edu.cn)
  • Online:2018-11-01

Abstract: The problem of binary classification with imbalanced data appears in many fields and is still not completely solved. In addition to predicting the classification label directly, many applications also care about the probability that data belongs to a certain class. However, much of the existing research is mainly focused on the classification performance but neglects the probability estimation. The aim of this paper is to improve the performance of class probability estimation (CPE) and ensure the classification performance. A new approach of regression is proposed by adopting the generalized linear model as the basic framework and using the calibration loss function as the objective optimization function. Considering the asymmetry and the flexibility of the generalized extreme value (GEV) distribution, we use it to formulate the link function, which contributes to binary classification with imbalanced data. As to the model estimation, because of the significant influence of the shape parameter on modeling precision, two methods to estimate the shape parameter in GEV distribution are proposed. Experiments on synthetic datasets prove the accuracy of the shape parameter estimation. Besides, experimental results on real data suggest that our proposed approach, compared with other three commonly used regression algorithms, performs well on the classification performance as well as CPE. In addition, the proposed algorithm also outperforms other optimization algorithms in terms of the computational efficiency.

Key words: linear model, extreme value distribution, imbalanced data, classification, probability estimation

CLC Number: