ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2021, Vol. 58 ›› Issue (3): 539-547.doi: 10.7544/issn1000-1239.2021.20200324

Previous Articles     Next Articles

Credit Fraud Detection for Extremely Imbalanced Data Based on Ensembled Deep Learning

Liu Ying1, Yang Ke2   

  1. 1(School of Management Science and Information Engineering, Jilin University of Finance and Economics, Changchun 130117);2(School of Taxation, Jilin University of Finance and Economics, Changchun 130117)
  • Online:2021-03-01
  • Supported by: 
    This work was supported by the National Social Science Foundation of China (20BTJ062).

Abstract: The existence of class imbalance in credit fraud data significantly undermines model performance. In particular, when the sample distribution is extremely unbalanced, noise caused by information distortion, statistical discrepancy and reporting bias will severely damage the process of training models, leading to potential issues such as overfitting. For this reason, this paper proposes an algorithm based on ensembled deep belief network, which is meant to tackle credit fraud data featured by extreme imbalance. First, this paper proposes joint sampling strategy combining under-sampling and over-sampling to retrieve training subset data. Then, we introduce an algorithm of constructing classifier clusters through two stages. Support vector classifiers and random forest classifiers are combined by using Boosting algorithm to overcome classification interface deviation of support vector machine. Finally, deep belief network is exploited to assemble classifiers’ predictions and output final classification result. Besides, traditional evaluation methods put too much emphasis on majority samples, ignoring the reality where the minority matters even more. The revenue cost index that considers identification of both positive and negative samples has thereby been introduced. This paper conducts empirical study on European credit card data and concludes a 3% higher performance on revenue cost index of the proposed algorithm than others’ average. The experiment also evaluates the influence of imbalance ratio over algorithm’s performance and finds that proposed algorithm outperforms others in this aspect.

Key words: credit fraud, extremely imbalanced data, deep belief network (DBN), support vector machine (SVM), revenue cost index

CLC Number: