• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Guo Husheng, Zhang Yang, Wang Wenjian. Two-Stage Adaptive Ensemble Learning Method for Different Types of Concept Drift[J]. Journal of Computer Research and Development, 2024, 61(7): 1799-1811. DOI: 10.7544/issn1000-1239.202330452
Citation: Guo Husheng, Zhang Yang, Wang Wenjian. Two-Stage Adaptive Ensemble Learning Method for Different Types of Concept Drift[J]. Journal of Computer Research and Development, 2024, 61(7): 1799-1811. DOI: 10.7544/issn1000-1239.202330452

Two-Stage Adaptive Ensemble Learning Method for Different Types of Concept Drift

Funds: This work was supported by the National Natural Science Foundation of China(62276157, U21A20513, 62076154, 61503229)and the Key Research and Development Program of Shanxi Province (202202020101003).
More Information
  • Author Bio:

    Guo Husheng: born in 1986. PhD, professor, PhD supervisor. Senior member of CCF. His main research interests include data mining, machine learning, and computational intelligence

    Zhang Yang: born in 1999. Master. Her main research interests include streaming data mining and online machine learning

    Wang Wenjian: born in 1968. PhD, professor, PhD supervisor. Distinguished member of CCF. Her main research interests include machine learning, data mining, and computational intelligence

  • Received Date: June 04, 2023
  • Revised Date: October 08, 2023
  • Available Online: April 09, 2024
  • In the era of big data, there is a large amount of streaming data emerging. Concept drift, as the most typical and difficult problem in streaming data mining, has received increasing attention. Ensemble learning is a common method for handling concept drift in streaming data. However, after drift occurs, learning models often cannot timely respond to the distribution changes of streaming data and cannot effectively handle different types of concept drift, leading to the decrease in model generalization performance. Aiming at this problem, we propose a two-stage adaptive ensemble learning method for different types of concept drift (TAEL). Firstly, the concept drift type is determined by detecting the drift span. Then, based on different drift types, a “filtering-expansion” two-stage sample processing mechanism is proposed to dynamically select appropriate sample processing strategy. Specifically, during the filtering stage, different non-critical sample filters are created for different drift types to extract key samples from historical sample blocks, making the historical data distribution closer to the latest data distribution and improving the effectiveness of the base learners. During the expansion stage, a block-priority sampling method is proposed, which sets an appropriate sampling scale for the drift type and sets the sampling priority according to the size proportion of the class in the current sample block to which the historical key sample belongs. Then, the sampling probability is determined based on the sampling priority, and a subset of key samples is extracted from the historical key sample blocks according to the sampling probability to expand the current sample block. This alleviates the class imbalance phenomenon after sample expansion, solves the underfitting problem of the current base learner and enhances its stability. Experimental results show that the proposed method can timely respond to different concept drift types, accelerate the convergence speed of online ensemble models after drift occurs, and improve the overall generalization performance of the model.

  • [1]
    Rutkowski L, Jaworski M, Duda P, et al. Basic concepts of data stream mining[J]. Stream Data Mining:Algorithms and Their Probabilistic Properties, 2020, 56: 13−33
    [2]
    翟婷婷,高阳,朱俊武. 面向流数据分类的在线学习综述[J]. 软件学报,2020,31(4):912−931

    Zhai Tingting, Gao Yang, Zhu Junwu. Survey of online learning algorithms for streaming data classification[J]. Journal of Software, 2020, 31(4): 912−931 (in Chinese)
    [3]
    王涛,李舟军,颜跃进,等. 数据流挖掘分类技术综述[J]. 计算机研究与发展,2007,44(11):1809−1815 doi: 10.1360/crad20071101

    Wang Tao, Li Zoujun, Yan Yuejin, et al. A survey of classification of data stream[J]. Journal of Computer Research and Development, 2007, 44(11): 1809−1815 (in Chinese) doi: 10.1360/crad20071101
    [4]
    杜航原,王文剑,白亮. 一种基于优化模型的演化数据流聚类方法[J]. 中国科学:信息科学,2017,47(11):1464−1482 doi: 10.1360/N112017-00107

    Du Hangyuan, Wang Wenjian, Bai Liang. A novel evolving data stream clustering method based on optimization model[J]. SCIENTIA SINICA Informationis, 2017, 47(11): 1464−1482 (in Chinese) doi: 10.1360/N112017-00107
    [5]
    Krempl G, Žliobaite I, Brzeziński D, et al. Open challenges for data stream mining research[J]. ACM SIGKDD Explorations Newsletter, 2014, 16(1): 1−10 doi: 10.1145/2674026.2674028
    [6]
    文益民,刘帅,缪裕青,等. 概念漂移数据流半监督分类综述[J]. 软件学报,2022,33(4):1287−1314

    Wen Yimin, Liu Shuai, Miao Yuqing, et al. Survey on semi-supervised classification of data streams with concept drifts[J]. Journal of Software, 2022, 33(4): 1287−1314 (in Chinese)
    [7]
    Lu Jie, Liu Anjin, Dong Fan, et al. Learning under concept drift: A review[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(12): 2346−2363
    [8]
    郭虎升,张爱娟,王文剑. 基于在线性能测试的概念漂移检测方法[J]. 软件学报,2020,31(4):932−947

    Guo Husheng, Zhang Aijuan, Wang Wenjian. Concept drift detection method based on online performance test[J]. Journal of Software, 2020, 31(4): 932−947 (in Chinese)
    [9]
    Lu Ning, Zhang Guangquan, Lu Jie. Concept drift detection via competence models[J]. Artificial Intelligence, 2014, 209: 11−28 doi: 10.1016/j.artint.2014.01.001
    [10]
    Krawczyk B, Minku L L, Gama J, et al. Ensemble learning for data stream analysis: A survey[J]. Information Fusion, 2017, 37: 132−156 doi: 10.1016/j.inffus.2017.02.004
    [11]
    梁斌,李光辉,代成龙. 面向概念漂移且不平衡数据流的G-mean加权分类方法[J]. 计算机研究与发展,2022,59(12):2844−2857 doi: 10.7544/issn1000-1239.20210471

    Liang Bin, Li Guanghui, Dai Chenglong. G-mean weighted classification method for imbalanced data stream with concept drift[J]. Journal of Computer Research and Development, 2022, 59(12): 2844−2857 (in Chinese) doi: 10.7544/issn1000-1239.20210471
    [12]
    Gomes H M, Barddal J P, Enembreck F, et al. A survey on ensemble learning for data stream classification[J]. ACM Computing Surveys, 2017, 50(2): 1−36
    [13]
    Webb G I, Hyde R, Cao Hong, et al. Characterizing concept drift[J]. Data Mining and Knowledge Discovery, 2016, 30(4): 964−994 doi: 10.1007/s10618-015-0448-4
    [14]
    Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing [C]//Proc of the 7th SIAM Int Conf on Data Mining. Philadelphia, PA: SIAM, 2007: 443−448
    [15]
    Gama J, Medas P, Castillo G, et al. Learning with drift detection [C]//Proc of the 17th Brazilian Symp on Artificial Intelligence. Berlin: Springer, 2004: 286−295
    [16]
    Nishida K, Yamauchi K. Detecting concept drift using statistical testing [C]//Proc of the 10th Int Conf on Discovery Science. Berlin: Springer, 2007: 264−269
    [17]
    Zhu Qun, Hu Xuegang, Zhang Yuhong, et al. A double-window-based classification algorithm for concept drifting data streams [C]//Proc of 2010 IEEE Int Conf on Granular Computing. Piscataway, NJ: IEEE, 2010: 639−644
    [18]
    郭虎升,任巧燕,王文剑. 基于时序窗口的概念漂移类别检测[J]. 计算机研究与发展,2022,59(1):127−143

    Guo Husheng, Ren Qiaoyan, Wang Wenjian. Concept drift class detection based on time window[J]. Journal of Computer Research and Development, 2022, 59(1): 127−143 (in Chinese)
    [19]
    Guo Husheng, Li Hai, Ren Qiaoyan, et al. Concept drift type identification based on multi-sliding windows[J]. Information Sciences, 2022, 585: 1−23 doi: 10.1016/j.ins.2021.11.023
    [20]
    Sidhu P, Bhatia M P S. An online ensembles approach for handling concept drift in data streams: Diversified online ensembles detection[J]. International Journal of Machine Learning and Cybernetics, 2015, 6(6): 883−909 doi: 10.1007/s13042-015-0366-1
    [21]
    Minku L L, White A P, Yao Xin. The impact of diversity on online ensemble learning in the presence of concept drift[J]. IEEE Transactions on Knowledge and Data Engineering, 2009, 22(5): 730−742
    [22]
    Shan Jicheng, Zhang Hang, Liu Weike, et al. Online active learning ensemble framework for drifted data streams[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 30(2): 486−498
    [23]
    Sun Yu, Tang Ke, Minku L L, et al. Online ensemble learning of data streams with gradually evolved classes[J]. IEEE Transactions on Knowledge and Data Engineering, 2016, 28(6): 1532−1545 doi: 10.1109/TKDE.2016.2526675
    [24]
    Street W N, Kim Y S. A streaming ensemble algorithm (SEA) for large-scale classification [C]//Proc of the 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2001: 377-382
    [25]
    Lu Yang, Cheung Y M, Tang Yuanyan. Dynamic weighted majority for incremental learning of imbalanced data streams with concept drift [C]//Proc of the 26th Int Joint Conf on Artificial Intelligence. Palo Alto, CA: AAAI, 2017: 2393−2399
    [26]
    Lu Yang, Cheung Y M, Tang Yuanyan. Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 31(8): 2764−2778
    [27]
    Ren Siqi, Zhu Wen, Liao Bo, et al. Selection-based resampling ensemble algorithm for nonstationary imbalanced stream data learning[J]. Knowledge-Based Systems, 2019, 163: 705−722 doi: 10.1016/j.knosys.2018.09.032
    [28]
    Li Zeng, Huang Wenchao, Xiong Yan, et al. Incremental learning imbalanced data streams with concept drift: The dynamic updated ensemble algorithm[J]. Knowledge-Based Systems, 2020, 195: 105694 doi: 10.1016/j.knosys.2020.105694
    [29]
    Guo Husheng, Zhang Shuai, Wang Wenjian. Selective ensemble-based online adaptive deep neural networks for streaming data with concept drift[J]. Neural Networks, 2021, 142: 437−456 doi: 10.1016/j.neunet.2021.06.027
    [30]
    Bifet A, Holmes G, Pfahringer B, et al. MOA: Massive online analysis, a framework for stream classification and clustering [C]//Proc of the 1st Workshop on Applications of Pattern Analysis. New York: PMLR, 2010: 44−50
    [31]
    赵鹏,周志华. 基于决策树模型重用的分布变化流数据学习 [J]. 中国科学:信息科学,2021,51(1):1−12

    Zhao Peng, Zhou Zhihua. Learning from distribution-changing data streams via decision tree model reuse [J]. SCIENTIA SINICA Informationis, 2021, 51(1): 1−12 (in Chinese)
    [32]
    Sahoo D, Pham Q, Lu Jing, et al. Online deep learning: Learning deep neural networks on the fly [C]//Proc of the 27th Int Joint Conf on Artificial Intelligence. Amsterdam: Elsevier, 2018: 2660−2666
    [33]
    He Kaiming, Zhang Xiangyu, Ren Shaoqing, et al. Deep residual learning for image recognition [C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 770−778
    [34]
    Srivastava R K, Greff K, Schmidhuber J. Training very deep networks [C]//Proc of the 28th Int Conf on Neural Information Processing Systems. Cambridge, MA: MIT, 2015: 2377−2385
    [35]
    Pereira D G, Afonso A, Medeiros F M. Overview of Friedman’s test and post-hoc analysis[J]. Communications in Statistics-Simulation and Computation, 2015, 44(10): 2636−2653 doi: 10.1080/03610918.2014.931971
    [36]
    Demšar J. Statistical comparisons of classifiers over multiple data sets[J]. The Journal of Machine Learning Research, 2006, 7: 1−30
  • Related Articles

    [1]Tang Xiaolan, Liang Yuting, Chen Wenlong. Multi-Stage Federated Learning Mechanism with non-IID Data in Internet of Vehicles[J]. Journal of Computer Research and Development, 2024, 61(9): 2170-2184. DOI: 10.7544/issn1000-1239.202330885
    [2]Zhao Xingwang, Zhang Yaopu, Liang Jiye. Two-Stage Ensemble-Based Community Discovery Algorithm in Multilayer Networks[J]. Journal of Computer Research and Development, 2023, 60(12): 2832-2843. DOI: 10.7544/issn1000-1239.202220214
    [3]Guo Husheng, Cong Lu, Gao Shuhua, Wang Wenjian. Adaptive Classification Method for Concept Drift Based on Online Ensemble[J]. Journal of Computer Research and Development, 2023, 60(7): 1592-1602. DOI: 10.7544/issn1000-1239.202220245
    [4]Wang Ruiqin, Wu Zongda, Jiang Yunliang, Lou Jungang. An Integrated Recommendation Model Based on Two-stage Deep Learning[J]. Journal of Computer Research and Development, 2019, 56(8): 1661-1669. DOI: 10.7544/issn1000-1239.2019.20190178
    [5]Chen Junyu, Zhou Gang, Nan Yu, Zeng Qi. Semi-Supervised Local Expansion Method for Overlapping Community Detection[J]. Journal of Computer Research and Development, 2016, 53(6): 1376-1388. DOI: 10.7544/issn1000-1239.2016.20148339
    [6]Zhang Jun, He Yanxiang, Shen Fanfan, Jiang Nan, Li Qing’an. Two-Stage Synchronization Based Thread Block Compaction Scheduling Method of GPGPU[J]. Journal of Computer Research and Development, 2016, 53(6): 1173-1185. DOI: 10.7544/issn1000-1239.2016.20150114
    [7]Shao Zengzhen, Wang Hongguo, Liu Hong, Song Chaochao, Meng Chunhua, Yu Hongling. Heuristic Optimization Algorithms of Multi-Carpooling Problem Based on Two-Stage Clustering[J]. Journal of Computer Research and Development, 2013, 50(11): 2325-2335.
    [8]Chang Qun, Wang Xiaolong, Lin Yimeng, Daniel S. Yeung, Chen Qingcai. Reducing Gaussian Kernel's Local Risks by Global Kernel and Two-Stage Model Selection Based on Genetic Algorithms[J]. Journal of Computer Research and Development, 2007, 44(3).
    [9]Guan Jianbo, Sun Zhigang, and Lu Xicheng. Using Multi-Stage Switch Fabric in High Performance Router Design[J]. Journal of Computer Research and Development, 2005, 42(6): 965-970.
    [10]Wu Weilin, Lu Ruzhan, Duan Jianyong, Liu Hui, Gao Feng, and Chen Yuquan. A Spoken Language Understanding Approach Based on TwoStage Classification[J]. Journal of Computer Research and Development, 2005, 42(5): 861-868.
  • Cited by

    Periodical cited type(2)

    1. 殷昱煜,吴广强,李尤慧子,王鑫雨,高洪皓. 基于特征约束和自适应损失平衡的机器遗忘方法. 计算机研究与发展. 2024(10): 2649-2661 . 本站查看
    2. 王勇,熊毅,杨天宇,沈益冉. 一种面向耳戴式设备的用户安全连续认证方法. 计算机研究与发展. 2024(11): 2821-2834 . 本站查看

    Other cited types(1)

Catalog

    Article views (90) PDF downloads (46) Cited by(3)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return