Balancing Method for Skewed Training Set in Data Mining
-
Graphical Abstract
-
Abstract
Classification is one of the important tasks in data mining. The training sets that are extracted for training classifiers are usually skewed. Traditional classification algorithms usually result in low predictive accuracy of minority classes when handling skewed training sets. The existing balancing algorithms only deal with the data sets which contain two classes of cases. In order to balance the training sets that have several classes, an algorithm called SSGP is introduced, based on the idea that little difference lies between the same class cases. SSGP forms new minority class cases by interpolating between several minority class cases that lie together, and makes sure that the number of each minority class case increases at the same speed. It is proved that SSGP would not add noise to the data set. To enhance the efficiency, SSGP adopts the modulus in stead of calculating a lot of dissimilarity between cases. The experimental results show that SSGP can improve the predictive accuracy of several minority classes by running once.
-
-