• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Fu Junjie, Liu Gongshen. A GEV-Based Classification Algorithm for Imbalanced Data[J]. Journal of Computer Research and Development, 2018, 55(11): 2361-2371. DOI: 10.7544/issn1000-1239.2018.20170514
Citation: Fu Junjie, Liu Gongshen. A GEV-Based Classification Algorithm for Imbalanced Data[J]. Journal of Computer Research and Development, 2018, 55(11): 2361-2371. DOI: 10.7544/issn1000-1239.2018.20170514

A GEV-Based Classification Algorithm for Imbalanced Data

More Information
  • Published Date: October 31, 2018
  • The problem of binary classification with imbalanced data appears in many fields and is still not completely solved. In addition to predicting the classification label directly, many applications also care about the probability that data belongs to a certain class. However, much of the existing research is mainly focused on the classification performance but neglects the probability estimation. The aim of this paper is to improve the performance of class probability estimation (CPE) and ensure the classification performance. A new approach of regression is proposed by adopting the generalized linear model as the basic framework and using the calibration loss function as the objective optimization function. Considering the asymmetry and the flexibility of the generalized extreme value (GEV) distribution, we use it to formulate the link function, which contributes to binary classification with imbalanced data. As to the model estimation, because of the significant influence of the shape parameter on modeling precision, two methods to estimate the shape parameter in GEV distribution are proposed. Experiments on synthetic datasets prove the accuracy of the shape parameter estimation. Besides, experimental results on real data suggest that our proposed approach, compared with other three commonly used regression algorithms, performs well on the classification performance as well as CPE. In addition, the proposed algorithm also outperforms other optimization algorithms in terms of the computational efficiency.
  • Related Articles

    [1]Wen Yuhong, Zhou You, Wu Qiulin, Wu Fei, Xie Changsheng. Quality of Service Guaranty Technology of Multi-Tenant Solid-State Drives: A Survey[J]. Journal of Computer Research and Development, 2023, 60(3): 555-571. DOI: 10.7544/issn1000-1239.202220561
    [2]Chen Yubiao, Li Jianzhong, Li Yingshu. SBS: An Efficient R-Tree Query Algorithm Exploiting the Internal Parallelism of SSDs[J]. Journal of Computer Research and Development, 2020, 57(11): 2404-2418. DOI: 10.7544/issn1000-1239.2020.20190564
    [3]Li Mengying, Wang Xiaodong, Ruan Shulan, Zhang Kun, Liu Qi. Student Performance Prediction Model Based on Two-Way Attention Mechanism[J]. Journal of Computer Research and Development, 2020, 57(8): 1729-1740. DOI: 10.7544/issn1000-1239.2020.20200181
    [4]Li Chu, Feng Dan, Wang Fang. A High Performance and Reliable Hybrid Host Cache System[J]. Journal of Computer Research and Development, 2017, 54(11): 2497-2507. DOI: 10.7544/issn1000-1239.2017.20160793
    [5]Li Xiangnan, Zhang Guangyan, Li Qiang, Zheng Weimin. A Survey on the Approaches of Building Solid State Disk Arrays[J]. Journal of Computer Research and Development, 2016, 53(9): 1893-1905. DOI: 10.7544/issn1000-1239.2016.20150910
    [6]Jiang Zhuoxuan, Zhang Yan, Li Xiaoming. Learning Behavior Analysis and Prediction Based on MOOC Data[J]. Journal of Computer Research and Development, 2015, 52(3): 614-628. DOI: 10.7544/issn1000-1239.2015.20140491
    [7]Gu Lianchao, Cui Lizhen. A Scalable and Self-Adjust Multi-Tenant Data Storage Strategy Under Different SLAs[J]. Journal of Computer Research and Development, 2014, 51(9): 2058-2069. DOI: 10.7544/issn1000-1239.2014.20131339
    [8]Chen Zhiguang, Xiao Nong, Liu Fang, and Du Yimo. A High Performance Reliable Storage System Using HDDs as the Backup of SSDs[J]. Journal of Computer Research and Development, 2013, 50(1): 80-89.
    [9]Zhao Tiezhu, Dong Shoubin, Verdi March, Simon See. Predicting the Parallel File System Performance via Machine Learning[J]. Journal of Computer Research and Development, 2011, 48(7): 1202-1215.
    [10]Li Shengmei, Cheng Buqi, Gao Xingyu, Qiao Lin, Tang Zhizhong. A Method on Analyzing Performance Sensitivity of Applications Based on Partial Derivatives of Non-linear Regression Equation[J]. Journal of Computer Research and Development, 2010, 47(9): 1654-1662.
  • Cited by

    Periodical cited type(5)

    1. 张黎,骆春山,谢委员,李蓓蓓. 基于分支混淆算法的隐私数据库自适应加密方法. 计算机与现代化. 2022(03): 43-47 .
    2. 陈金娥,陈涛,童绪军. 基于混合加密算法的安全通讯系统的设计与实现. 兰州文理学院学报(自然科学版). 2022(05): 67-71+93 .
    3. 肖玉强,郭云飞,王亚文. 基于符号执行和N-scope复杂度的代码混淆度量方法. 网络与信息安全学报. 2022(06): 123-134 .
    4. 王晓龙,董玉雪. 软件多分支开发代码漏合问题及解决途径. 计算机系统应用. 2021(10): 312-318 .
    5. 鲍海燕,芦彩林,李俊丽. 基于公钥密码的通信网络安全加密系统设计. 重庆理工大学学报(自然科学). 2020(10): 146-152 .

    Other cited types(3)

Catalog

    Article views (975) PDF downloads (485) Cited by(8)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return