Citation: | Guo Husheng, Cong Lu, Gao Shuhua, Wang Wenjian. Adaptive Classification Method for Concept Drift Based on Online Ensemble[J]. Journal of Computer Research and Development, 2023, 60(7): 1592-1602. DOI: 10.7544/issn1000-1239.202220245 |
In view of the problems that the online learning model cannot respond in time to the change of data distribution and it is difficult to extract the latest information of data distribution after concept drift occurs in streaming data, which leads to slow convergence of the learning model, an adaptive classification method for concept drift based on online ensemble (AC_OE) is presented. On the one hand, the online ensemble strategy is used to construct a local online learner, which can dynamically adjust the weight of base learner by local prediction of training samples in data blocks. It is helpful to not only extract the evolution information of streaming data in depth to make a more accurate response to the change of data distribution, but also improve the adaptability of the online learning model to the new data distribution after the occurrence of concept drift, and the real-time generalization performance of the learning model is improved too. On the other hand, the incremental learning strategy is used to construct a global incremental learner, and incremental training updates are carried out with the entry of new samples. The method extracts global distribution information of streaming data, and the model can maintain good robustness in the steady state of streaming data. Experimental results show that the proposed method can respond to concept drift and accelerate the convergence of online learning model, and improve the overall generalization performance of the learner effectively.
[1] |
Georg K, Zliobaite I, Brzezinski D. Open challenges for data stream mining research[J]. ACM SIGKDD Explorations Newsletter, 2014, 16(1): 1−10 doi: 10.1145/2674026.2674028
|
[2] |
Lughofer E, Pratama M. Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(1): 292−309 doi: 10.1109/TFUZZ.2017.2654504
|
[3] |
翟婷婷,高阳,朱俊武. 面向流数据分类的在线学习综述[J]. 软件学报,2020,31(4):912−931 doi: 10.13328/j.cnki.jos.005916
Zhai Tingting, Gao Yang, Zhu Junwu. Survey of online learning algorithms for streaming data classification[J]. Journal of Software, 2020, 31(4): 912−931 (in Chinese) doi: 10.13328/j.cnki.jos.005916
|
[4] |
杜航原,王文剑,白亮. 一种基于优化模型的演化数据流聚类方法[J]. 中国科学:信息科学,2017,47(11):1464−1482 doi: 10.1360/N112017-00107
Du Hangyuan, Wang Wenjian, Bai Liang. A novel evolving data stream clustering method based on optimization model[J]. SCIENTIA SINICA:Informationis, 2017, 47(11): 1464−1482 (in Chinese) doi: 10.1360/N112017-00107
|
[5] |
Ma J, Saul L K, Savage S, et al. Identifying suspicious URLs: An application of large-scale online learning [C] // Proc of the 26th Annual Int Conf on Machine Learning, New York: ACM, 2009: 681−688
|
[6] |
Lu Jie, Liu Anjin, Dong Fan, et al. Learning under concept drift: A review[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(12): 2346−2363
|
[7] |
Tennant M, Stahl F T, Rana O F, et al. Scalable real-time classification of data streams with concept drift[J]. Future Generation Computer Systems, 2017, 75: 187−199 doi: 10.1016/j.future.2017.03.026
|
[8] |
Du Lei, Song Qinbao, Jia Xiaolin. Detecting concept drift: An information entropy based method using an adaptive sliding window[J]. Intelligent Data Analysis, 2014, 18(3): 337−364 doi: 10.3233/IDA-140645
|
[9] |
Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing [C] // Proc of the 7th SIAM Int Conf on Data Mining. Philadelphia, PA: SIAM, 2007: 443−448
|
[10] |
Guo Husheng, Li Hai, Ren Qiaoyan, et al. Concept drift type identification based on multi-sliding windows[J]. Information Sciences, 2022, 585: 1−23 doi: 10.1016/j.ins.2021.11.023
|
[11] |
郭虎升,任巧燕,王文剑. 基于时序窗口的概念漂移类别检测[J]. 计算机研究与发展,2022,59(1):127−143 doi: 10.7544/issn1000-1239.20200562
Guo Husheng, Ren Qiaoyan, Wang Wenjian. Concept drift class detection based on time window[J]. Journal of Computer Research and Development, 2022, 59(1): 127−143 (in Chinese) doi: 10.7544/issn1000-1239.20200562
|
[12] |
Baena-García M, Campo-Ávila R J, Fidalgo D, et al. Early drift detection method [C] // Proc of the 17th ECML PKDD Int Workshop on Knowledge Discovery From Data Streams. Berlin: Springer, 2006: 77–86
|
[13] |
郭虎升,张爱娟,王文剑. 基于在线性能测试的概念漂移检测方法[J]. 软件学报,2020,31(4):932−947 doi: 10.13328/j.cnki.jos.005917
Guo Husheng, Zhang Aijuan, Wang Wenjian. Concept drift detection method based on online performance test[J]. Journal of Software, 2020, 31(4): 932−947 (in Chinese) doi: 10.13328/j.cnki.jos.005917
|
[14] |
文益民,唐诗淇,冯超,等. 基于在线迁移学习的重现概念漂移数据流分类[J]. 计算机研究与发展,2016,53(8):1781−1791 doi: 10.7544/issn1000-1239.2016.20160223
Wen Yimin, Tang Shiqi, Feng Chao, et al. Online transfer learning for mining recurring concept in data stream classification[J]. Journal of Research and Development, 2016, 53(8): 1781−1791 (in Chinese) doi: 10.7544/issn1000-1239.2016.20160223
|
[15] |
Street W N, Kim Y S. A streaming ensemble algorithm (SEA) for large-scale classification [C] // Proc of the 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2001: 377−382
|
[16] |
Lu Yang, Cheung Y M, Tang Yuanyan. Adaptive chunk-based dynamic weighted majority for imbalanced data streams with concept drift[J]. IEEE Transactions on Neural Networks and Learning Systems, 2020, 31(8): 2764−2778 doi: 10.1109/TNNLS.2019.2951814
|
[17] |
Brzezinski D, Stefanowski J. Reacting to different types of concept drift: The accuracy updated ensemble algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(1): 81−94 doi: 10.1109/TNNLS.2013.2251352
|
[18] |
Kolter J, Maloof M. Dynamic weighted majority: A new ensemble method for tracking concept drift [C] // Proc of the 3rd IEEE Int Conf on Data Mining. Piscataway, NJ: IEEE, 2003: 123−130
|
[19] |
Elwell R, Polikar R. Incremental learning of concept drift in nonstationary environments[J]. IEEE Transactions on Neural Networks, 2011, 22(10): 1517−1531 doi: 10.1109/TNN.2011.2160459
|
[20] |
Guo Husheng, Zhang Shuai, Wang Wenjian. Selective ensemble-based online adaptive deep neural networks for streaming data with concept drift[J]. Neural Networks, 2021, 142: 437−456 doi: 10.1016/j.neunet.2021.06.027
|
[21] |
Sun Yu, Tang Ke, Zhu Zexuan, et al. Concept drift adaptation by exploiting historical knowledge[J]. IEEE Transactions on Neural Networks and Learning Systems, 2018, 29(10): 4822−1832 doi: 10.1109/TNNLS.2017.2775225
|
[22] |
Shan Jicheng, Zhang Hang, Li Weike, et al. Online active learning ensemble framework for drifted data streams[J]. IEEE Transactions on Neural Networks and Learning Systems, 2019, 30(2): 486−498 doi: 10.1109/TNNLS.2018.2844332
|
[23] |
Oza N C. Online bagging and boosting [C] // Proc of the IEEE Int Conf on Systems, Man and Cybernetics. Piscataway, NJ: IEEE, 2005: 2340−2345
|
[24] |
Oza N C, Russell S. Experimental comparisons of online and batch versions of bagging and boosting [C] //Proc of the 7th ACM SIGKDD Int Conf on Knowledge Discovery and Data Mining. New York: ACM, 2001: 359−364
|
[25] |
Bifet A, Holmes G, Kirkby R, et al. MOA: Massive online analysis[J]. Journal of Machine Learning Research, 2010, 11(52): 1601−1604
|
[26] |
Sigkdd. KDDCup99 data [DB/OL]. [2019-04-19]. http://kdd.ics.uci.edu/data-baseskddcup99/kddcup99.htlm
|
[27] |
赵鹏,周志华. 基于决策树模型重用的分布变化流数据学习[J]. 中国科学:信息科学,2021,51(1):1−12 doi: 10.1360/SSI-2020-0170
Zhao Peng, Zhou Zhihua. Learning from distribution-changing data streams via decision tree model reuse[J]. SCIENTIA SINICA:Informationis, 2021, 51(1): 1−12 (in Chinese) doi: 10.1360/SSI-2020-0170
|
[28] |
Demsar J. Statistical comparisons of classifiers over multiple datasets[J]. Journal of Machine Learning Research, 2006, 7(1): 1−30
|
1. |
马乾骏,郭虎升,王文剑. 在线深度神经网络的弱监督概念漂移检测方法. 小型微型计算机系统. 2024(09): 2094-2101 .
![]() | |
2. |
韩光洁,赵腾飞,刘立,张帆,徐政伟. 基于多元区域集划分的工业数据流概念漂移检测. 电子学报. 2023(07): 1906-1916 .
![]() |