• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Liu Ying, Yang Ke. Credit Fraud Detection for Extremely Imbalanced Data Based on Ensembled Deep Learning[J]. Journal of Computer Research and Development, 2021, 58(3): 539-547. DOI: 10.7544/issn1000-1239.2021.20200324
Citation: Liu Ying, Yang Ke. Credit Fraud Detection for Extremely Imbalanced Data Based on Ensembled Deep Learning[J]. Journal of Computer Research and Development, 2021, 58(3): 539-547. DOI: 10.7544/issn1000-1239.2021.20200324

Credit Fraud Detection for Extremely Imbalanced Data Based on Ensembled Deep Learning

Funds: This work was supported by the National Social Science Foundation of China (20BTJ062).
More Information
  • Published Date: February 28, 2021
  • The existence of class imbalance in credit fraud data significantly undermines model performance. In particular, when the sample distribution is extremely unbalanced, noise caused by information distortion, statistical discrepancy and reporting bias will severely damage the process of training models, leading to potential issues such as overfitting. For this reason, this paper proposes an algorithm based on ensembled deep belief network, which is meant to tackle credit fraud data featured by extreme imbalance. First, this paper proposes joint sampling strategy combining under-sampling and over-sampling to retrieve training subset data. Then, we introduce an algorithm of constructing classifier clusters through two stages. Support vector classifiers and random forest classifiers are combined by using Boosting algorithm to overcome classification interface deviation of support vector machine. Finally, deep belief network is exploited to assemble classifiers’ predictions and output final classification result. Besides, traditional evaluation methods put too much emphasis on majority samples, ignoring the reality where the minority matters even more. The revenue cost index that considers identification of both positive and negative samples has thereby been introduced. This paper conducts empirical study on European credit card data and concludes a 3% higher performance on revenue cost index of the proposed algorithm than others’ average. The experiment also evaluates the influence of imbalance ratio over algorithm’s performance and finds that proposed algorithm outperforms others in this aspect.
  • Related Articles

    [1]Zhou Yuhang, Zhou Zhihua. Cost-Sensitive Large Margin Distribution Machine[J]. Journal of Computer Research and Development, 2016, 53(9): 1964-1970. DOI: 10.7544/issn1000-1239.2016.20150436
    [2]Liu Yue, Li Jintao, Hu Songlin. A Cost-Based Splitting Policy Search Algorithm for Hive Multi-Dimensional Index[J]. Journal of Computer Research and Development, 2016, 53(4): 798-810. DOI: 10.7544/issn1000-1239.2016.20151163
    [3]Lü Qi, Dou Yong, Niu Xin, Xu Jiaqing, Xia Fei. Remote Sensing Image Classification Based on DBN Model[J]. Journal of Computer Research and Development, 2014, 51(9): 1911-1918. DOI: 10.7544/issn1000-1239.2014.20140199
    [4]Guo Husheng, Wang Wenjian. A Support Vector Machine Learning Method Based on Granule Shift Parameter[J]. Journal of Computer Research and Development, 2013, 50(11): 2315-2324.
    [5]Weng Dawei, Yin Yilong, Yang Gongping, and Qi Xiuyan. Singular Point Extraction from Fingerprint Based on Gaussian-Hermite Moment and Improved Poincare Index[J]. Journal of Computer Research and Development, 2008, 45(11): 1974-1984.
    [6]Li Chunhua, Ling Hefei, and Lu Zhengding. Adaptive Spatial Domain Image Watermarking Based on Support Vector Machine[J]. Journal of Computer Research and Development, 2007, 44(8): 1399-1405.
    [7]Zhang Xingming and Li Heheng. A Face Verification Algorithm Based on Negative Independent Sample Set and SVM[J]. Journal of Computer Research and Development, 2006, 43(12): 2138-2143.
    [8]Li Yingxin and Ruan Xiaogang. Feature Selection for Cancer Classification Based on Support Vector Machine[J]. Journal of Computer Research and Development, 2005, 42(10): 1796-1801.
    [9]Wang Jian, Lin Fuzong. Digital Audio Watermarking Based on Support Vector Machine (SVM)[J]. Journal of Computer Research and Development, 2005, 42(9): 1605-1611.
    [10]Wu Gaowei, Tao Qing, Wang Jue. Support Vector Machines Based on Posteriori Probability[J]. Journal of Computer Research and Development, 2005, 42(2): 196-202.
  • Cited by

    Periodical cited type(18)

    1. 唐小勇,王浩东. 融合子图选择和邻域过滤的信贷欺诈审核方法. 计算机科学与探索. 2025(02): 465-475 .
    2. 黄子健,高欣,李保丰,翟峰,秦煜,叶平. 不平衡数据集下基于多粒度近邻图的智能电表故障分类方法. 电网技术. 2024(03): 1291-1309 .
    3. 王晓霞,李雷孝,林浩. SMOTE类算法研究综述. 计算机科学与探索. 2024(05): 1135-1159 .
    4. 胡海川,代刊. 我国近海阵风预报研究. 气象. 2024(06): 711-722 .
    5. 顾明,李飞凤,王晓勇,郑冬花. 基于改进SMOTE算法和深度学习集成框架的信用卡欺诈检测. 贵阳学院学报(自然科学版). 2024(02): 99-104+115 .
    6. 孔翎超,刘国柱. 离群点检测算法综述. 计算机科学. 2024(08): 20-33 .
    7. 袁洁贞,王志勇. 基于交易行为表征学习的企业财务欺诈检测方法. 贵阳学院学报(自然科学版). 2024(03): 92-97+109 .
    8. 胡海川,董林. 一种基于集合数值预报产品的台风强度订正方法. 气象学报. 2023(02): 316-327 .
    9. 郜佳蕾,郜佳慧. 基于反馈监督式学习策略的信用卡欺诈检测方法. 淮阴师范学院学报(自然科学版). 2023(02): 125-131 .
    10. 马召贵. 基于改进KNN的不均衡信息文本分类算法. 信息与电脑(理论版). 2023(12): 85-87 .
    11. 刘华玲,曹世杰,许珺怡,陈尚辉. 数字信用交易反欺诈研究进展. 计算机科学与探索. 2023(10): 2300-2324 .
    12. 蒋洪迅,江俊毅,梁循. 基于机器学习的信用卡交易欺诈检测研究综述. 计算机工程与应用. 2023(21): 1-25 .
    13. 胡海川,钱传海,渠鸿宇. 黄渤海及其邻近地区阵风估测改进. 应用气象学报. 2023(06): 668-680 .
    14. 付钰菲,汪明艳. 深度学习在金融领域的应用研究综述. 软件工程. 2022(03): 1-4 .
    15. 储安琪,丁志军. 基于灰狼优化算法的信用评估样本均衡化与特征选择同步处理. 计算机科学. 2022(04): 134-139 .
    16. 刘波,梁龙跃. 基于KM-SVMSMOTE-CNN的信用卡欺诈检测. 计算机系统应用. 2022(06): 361-367 .
    17. 刘学文,王继奎,杨正国,李强,易纪海,李冰,聂飞平. 密度峰值优化的球簇划分欠采样不平衡数据分类算法. 计算机应用. 2022(05): 1455-1463 .
    18. 许明珠,徐浩,孔鹏,吴艳兰. 结合植被指数和卷积神经网络的遥感植被分类方法. 激光与光电子学进展. 2022(24): 273-285 .

    Other cited types(27)

Catalog

    Article views (1011) PDF downloads (535) Cited by(45)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return