• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xiong Bingyan, Wang Guoyin, Deng Weibin. Under-Sampling Method Based on Sample Weight for Imbalanced Data[J]. Journal of Computer Research and Development, 2016, 53(11): 2613-2622. DOI: 10.7544/issn1000-1239.2016.20150593
Citation: Xiong Bingyan, Wang Guoyin, Deng Weibin. Under-Sampling Method Based on Sample Weight for Imbalanced Data[J]. Journal of Computer Research and Development, 2016, 53(11): 2613-2622. DOI: 10.7544/issn1000-1239.2016.20150593

Under-Sampling Method Based on Sample Weight for Imbalanced Data

More Information
  • Published Date: October 31, 2016
  • Imbalanced data exists widely in the real world, and its classification is a hot topic in data mining and machine learning. Under-sampling is a widely used method in dealing imbalanced data set and its main idea is choosing a subset of majority class to make the data set balanced. However, some useful majority class information may be lost. In order to solve the problem, an under-sampling method based on sample weight for imbalance problem is proposed, named as KAcBag (K-means AdaCost bagging). In this method, sample weight is introduced to reveal the area where the sample is located. Firstly, according to the sample scale, a weight is made for each sample and is modified after clustering the data set. The samples which have less weight in the center of majority class. Then some samples are drawn from majority class in accordance with the sample weight. In the procedure, the samples in the center of majority class can be selected easily. The sampled majority class samples and all the minority class samples are combined as the training data set for a component classifier. After that, we can get several decision tree sub-classifiers. Finally, the prediction model is constructed based on the accuracy of each sub-classifiers. Experimental tests on nineteen UCI data sets and telecom user data show that KAcBag can make the selected samples have more representativeness. Based on that, this method can improve the the classification performance of minority class and reduce the scale of the problem.
  • Related Articles

    [1]Zhang Zilin, Liu Duo, Tan Yujuan, Wu Yu, Luo Longpan, Wang Weilüe, Qiao Lei. An Erasure-Coded Data Update Method for Distributed Storage Clusters[J]. Journal of Computer Research and Development, 2022, 59(11): 2451-2466. DOI: 10.7544/issn1000-1239.20210211
    [2]Chen Jinyin, Huang Guohan, Zhang Dunjie, Zhang Xuhong, Ji Shouling. GRD-GNN: Graph Reconstruction Defense for Graph Neural Network[J]. Journal of Computer Research and Development, 2021, 58(5): 1075-1091. DOI: 10.7544/issn1000-1239.2021.20200935
    [3]Li Guorui, Meng Jie, Peng Sancheng, Wang Cong. A Distributed Data Reconstruction Algorithm Based on Jacobi ADMM for Compressed Sensing in Sensor Networks[J]. Journal of Computer Research and Development, 2020, 57(6): 1284-1291. DOI: 10.7544/issn1000-1239.2020.20190587
    [4]Tang Yingjie, Wang Fang, Xie Yanwen. An Efficient Failure Reconstruction Based on In-Network Computing for Erasure-Coded Storage Systems[J]. Journal of Computer Research and Development, 2019, 56(4): 767-778. DOI: 10.7544/issn1000-1239.2019.20170834
    [5]Fu Yingxun, Wen Shilin, Ma Li, Shu Jiwu. Survey on Single Disk Failure Recovery Methods for Erasure Coded Storage Systems[J]. Journal of Computer Research and Development, 2018, 55(1): 1-13. DOI: 10.7544/issn1000-1239.2018.20160506
    [6]Liu Hai, Li Xinghua, Ma Jianfeng. Rational Secret Sharing Scheme Based on Reconstruction Order Adjustment Mechanism[J]. Journal of Computer Research and Development, 2015, 52(10): 2332-2340. DOI: 10.7544/issn1000-1239.2015.20150511
    [7]Li Yibin, Jia Zhiping, Xie Shuai, and Liu Fucai. Partial Dynamic Reconfigurable WSN Node with Power and Area Efficiency[J]. Journal of Computer Research and Development, 2014, 51(1): 173-179.
    [8]Fan Liya, Zhang Fa, Wang Gongming, Liu Zhiyong. Algorithm Analysis and Efficient Parallelization of the Single Particle Reconstruction Software Package: EMAN[J]. Journal of Computer Research and Development, 2010, 47(12).
    [9]Zhang Hongcan and Xue Wei. Reliability Analysis of Cluster RAID5 Storage System[J]. Journal of Computer Research and Development, 2010, 47(4): 727-735.
    [10]Ma Yili, Fu Xianglin, Han Xiaoming, and Xu Lu. The Separation between Storage and Computation[J]. Journal of Computer Research and Development, 2005, 42(3).
  • Cited by

    Periodical cited type(5)

    1. 张钦宇,张智凯,安丽荣,杨君一,张瑞. 面向天基数据中心的编码修复数据流调度. 移动通信. 2023(07): 21-26 .
    2. 杨浩,李竣业. 电力用户多渠道自动缴费习惯预判预警系统设计. 信息技术. 2021(03): 155-160 .
    3. 包涵,王意洁,许方亮. 基于生成矩阵变换的跨数据中心纠删码写入方法. 计算机研究与发展. 2020(02): 291-305 . 本站查看
    4. 李慧,李贵洋,胡金平,周悦,江小玉,韩鸿宇. 基于分布式存储的OHitchhiker码. 计算机工程与设计. 2020(07): 1941-1946 .
    5. 严新成,陈越,巴阳,贾洪勇,朱彧. 云环境下支持可更新加密的分布式数据编码存储方案. 计算机研究与发展. 2019(10): 2170-2182 . 本站查看

    Other cited types(11)

Catalog

    Article views (1727) PDF downloads (950) Cited by(16)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return