Citation: | Guo Husheng, Liu Yanjie, Wang Wenjian. Concept Drift Processing Method of Streaming Data Based on Mixed Feature Extraction[J]. Journal of Computer Research and Development, 2024, 61(6): 1497-1510. DOI: 10.7544/issn1000-1239.202330184 |
In the era of big data, more and more data are generated in the form of data streams, which makes concept drift an important but difficult problem in streaming data mining due to its fast, infinite, unstable and dynamically changing characteristics. Most of the current concept drift processing methods have limited information extraction capability and do not fully consider the temporal features of streaming data. To address these problems, a concept drift processing method of streaming data based on mixed feature extraction (MFECD) is proposed. The method first uses convolutional kernels of different scales to model the data to construct splicing features, and uses a gating mechanism to fuse shallow inputs and splicing features for adaptive integration as different network level inputs to obtain data features that can take into account both detailed and semantic information. Based on this, attention mechanism and similarity calculation are used to evaluate the importance of stream data at different moments in order to enhance the temporal features of key site of the data stream. The experimental results show that our method can effectively extract the complex data features and temporal features contained in the streaming data, and improve the processing capability of concept drift in the data stream.
[1] |
Lughofer E, Pratama M. Online active learning in data stream regression using uncertainty sampling based on evolving generalized fuzzy models[J]. IEEE Transactions on Fuzzy Systems, 2018, 26(1): 292−309 doi: 10.1109/TFUZZ.2017.2654504
|
[2] |
翟婷婷,高阳,朱俊武. 面向流数据分类的在线学习综述[J]. 软件学报,2020,31(4):912−931
Zhai Tingting, Gao Yang, Zhu Junwu. Survey of online learning algorithms for streaming data classification[J]. Journal of Software, 2020, 31(4): 912−931 (in Chinese)
|
[3] |
杜航原,王文剑,白亮. 一种基于优化模型的演化数据流聚类方法[J]. 中国科学:信息科学,2017,47(11):1464−1482 doi: 10.1360/N112017-00107
Du Hangyuan, Wang Wenjian, Bai Liang. A novel evolving data stream clustering method based on optimization model[J]. SCIENTIA SINICA Informationis, 2017, 47(11): 1464−1482 (in Chinese) doi: 10.1360/N112017-00107
|
[4] |
Lu Jie, Liu Anjin, Dong Fan, et al. Learning under concept drift: A review[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 31(12): 2346−2363
|
[5] |
Tennant M, Stahl F T, Rana O F, et al. Scalable real-time classification of data streams with concept drift[J]. Future Generation Computer Systems, 2017, 75: 187−199 doi: 10.1016/j.future.2017.03.026
|
[6] |
Sergio R G, Krawczyk B, Garca S, et al. A survey on data preprocessing for data stream mining: Current status and future directions[J]. Neurocomputing, 2017, 239: 39−57 doi: 10.1016/j.neucom.2017.01.078
|
[7] |
Bifet A, Gavalda R. Learning from time-changing data with adaptive windowing[C]//Proc of the 7th SIAM Int Conf on Data Mining. Philadelphia, PA: SIAM, 2007: 443−448
|
[8] |
Du Lei, Song Qinbao, Jia Xiaolin. Detecting concept drift: An information entropy based method using an adaptive sliding window[J]. Intelligent Data Analysis, 2014, 18(3): 337−364 doi: 10.3233/IDA-140645
|
[9] |
郭虎升,任巧燕,王文剑. 基于时序窗口的概念漂移类别检测[J]. 计算机研究与发展,2022,59(1):127−143 doi: 10.7544/issn1000-1239.20200562
Guo Husheng, Ren Qiaoyan, Wang Wenjian. Concept drift class detection based on time window[J]. Journal of Computer Research and Development, 2022, 59(1): 127−143 (in Chinese) doi: 10.7544/issn1000-1239.20200562
|
[10] |
Guo Husheng, Li Hai, Ren Qiaoyan, et al. Concept drift type identification based on multi-sliding windows[J]. Information Sciences, 2022, 585: 1−23 doi: 10.1016/j.ins.2021.11.023
|
[11] |
Baena-García M, Campo-Ávila R J, Fidalgo Del, et al. Early drift detection method[C]//Proc of the 17th ECML PKDD Int Workshop on Knowledge Discovery from Data Streams. Berlin: Springer, 2006: 77–86
|
[12] |
Alippi C, Boracchi G, Roveri M. Just-in-time classifiers for recurrent concepts[J]. IEEE Transactions on Neural Networks and Learning Systems, 2013, 24(4): 620−634 doi: 10.1109/TNNLS.2013.2239309
|
[13] |
文益民,唐诗淇,冯超,等. 基于在线迁移学习的重现概念漂移数据流分类[J]. 计算机研究与发展,2016,53(8):1781−1791 doi: 10.7544/issn1000-1239.2016.20160223
Wen Yimin, Tang Shiqi, Feng Chao, et al. Online transfer learning for mining recurring concept in data stream classification[J]. Journal of Computer Research and Development, 2016, 53(8): 1781−1791 (in Chinese) doi: 10.7544/issn1000-1239.2016.20160223
|
[14] |
郭虎升,张爱娟,王文剑. 基于在线性能测试的概念漂移检测方法[J]. 软件学报,2020,31(4):932−947
Guo Husheng, Zhang Aijuan, Wang Wenjian. Concept drift detection method based on online performance test[J]. Journal of Software, 2020, 31(4): 932−947 (in Chinese)
|
[15] |
Nishida K, Yamauchi K. Detecting concept drift using statistical testing[C]//Proc of the 10th Int Conf on Discovery Science. Berlin: Springer, 2007: 264−269
|
[16] |
Pears R, Sakthithasan S, Koh Y S. Detecting concept change in dynamic data streams: A sequential approach based on reservoir sampling[J]. Machine Learning, 2014, 97: 259−293 doi: 10.1007/s10994-013-5433-9
|
[17] |
Brzezinski D, Stefanowski J. Reacting to different types of concept drift: The accuracy updated ensemble algorithm[J]. IEEE Transactions on Neural Networks and Learning Systems, 2014, 25(1): 81−94 doi: 10.1109/TNNLS.2013.2251352
|
[18] |
Junior J R. Graph embedded rules for explainable predictions in data streams[J]. Neural Networks, 2020, 129: 174−192 doi: 10.1016/j.neunet.2020.05.035
|
[19] |
Zhao Peng, Zhou Zhihua. Learning from distribution-changing data streams via decision tree model reuse[J]. SCIENTIA SINICA Informationis, 2021, 51(1): 1−12 doi: 10.1360/SSI-2020-0170
|
[20] |
蔡桓,陆克中,伍启荣,等. 面向概念漂移数据流的自适应分类方法[J]. 计算机研究与发展,2022,59(3):633−646 doi: 10.7544/issn1000-1239.20201017
Cai Huan, Lu Kezhong, Wu Qirong, et al. Adaptive classification algorithm for concept drift data streams[J]. Journal of Computer Research and Development, 2022, 59(3): 633−646 (in Chinese) doi: 10.7544/issn1000-1239.20201017
|
[21] |
梁斌,李光辉,代成龙. 面向概念漂移且不平衡数据流的G-mean加权分类方法[J]. 计算机研究与发展,2022,59(12):2844−2857 doi: 10.7544/issn1000-1239.20210471
Liang Bin, Li Guanghui, Dai Chenglong. G-mean weight classification method for imbalanced data stream with concept drift[J]. Journal of Computer Research and Development, 2022, 59(12): 2844−2857 (in Chinese) doi: 10.7544/issn1000-1239.20210471
|
[22] |
Ashfahani A, Pratama M. Autonomous deep learning: Continual learning approach for dynamic environments[C]//Proc of the 2019 SIAM Int Conf on Data Mining. Philadelphia, PA: SIAM, 2019: 666−674
|
[23] |
Lobo J L, Laña I, Del Ser J, et al. Evolving spiking neural networks for online learning over drifting data streams[J]. Neural Networks, 2018, 108: 1−19 doi: 10.1016/j.neunet.2018.07.014
|
[24] |
Guo Husheng, Zhang Shuai, Wang Wenjian. Selective ensemble-based online adaptive deep neural networks for streaming data with concept drift[J]. Neural Networks, 2021, 142: 437−456 doi: 10.1016/j.neunet.2021.06.027
|
[25] |
Sahoo D, Pham Q, Lu Jing, et al. Online deep learning: Learning deep neural networks on the fly [J]. arXiv preprint, arXiv: 1711.03705, 2017
|
[26] |
Kauschke S, Lehmann D H, Fürnkranz J. Patching deep neural networks for nonstationary environments[C]//Proc of the 2019 Int Joint Conf on Neural Networks. Piscataway, NJ: IEEE, 2019: 1−8
|
[27] |
Chollet F. Xception: Deep learning with depthwise separable convolutions [C] //Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 1251−1258
|
[28] |
Woo S, Park J, Lee J Y, et al. Cbam: Convolutional block attention module [C] //Proc of the European Conf on Computer Vision. Berlin: Springer, 2018: 3−19
|
[29] |
Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1): 119−139 doi: 10.1006/jcss.1997.1504
|
[30] |
Russakovsky O, Jia Deng, Hao Su, et al. ImageNet large scale visual recognition challenge[J]. International Journal of Computer Vision, 2015, 115: 211−252 doi: 10.1007/s11263-015-0816-y
|
[31] |
Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: Common objects in context[C]//Proc of the 13th European Conf on Computer Vision. Berlin: Springer, 2014: 740−755
|
[32] |
Real E, Shlens J, Mazzocchi S, et al. YouTube-boundingboxes: A large high-precision human-annotated data set for object detection in video[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 5296−5305
|
[33] |
Fan Heng, Lin Liting, Yang Fan, et al. LaSOT: A high-quality benchmark for large-scale single object tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 5374−5383
|
[34] |
Wu Yi, Lim Jongwoo, Yang M H. Online object tracking: A benchmark[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2013: 2411−2418
|
[35] |
Mueller M, Smith N, Ghanem B. A benchmark and simulator for UAV tracking[C]//Proc of the 14th European Conf on Computer Vision. Berlin: Springer, 2016: 445−461
|
[36] |
Guo Dongyan, Wang Jun, Cui Ying, et al. SiamCAR: Siamese fully convolutional classification and regression for visual tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 6269−6277
|
[37] |
Li Bo, Wu Wei, Wang Qiang, et al. SiamRPN++: Evolution of Siamese visual tracking with very deep networks[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 4282−4291
|
[38] |
Zhang Zhipeng, Peng Houwen, Fu Jianlong, et al. Ocean: Object-aware anchor-free tracking[C]//Proc of the 16th European Conf on Computer Vision. Berlin: Springer, 2020: 771−787
|
[39] |
Voigtlaender P, Luiten J, Torr P H S, et al. SiamR-CNN: Visual tracking by re-detection[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 6578−6588
|
[40] |
Danelljan M, Bhat G, Khan F S, et al. ATOM: Accurate tracking by overlap maximization[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 4660−4669
|
[41] |
Bertinetto L, Valmadre J, Henriques J F, et al. Fully-convolutional siamese networks for object tracking[C]//Proc of the European Conf on Computer Vision. Berlin: Springer, 2016: 850−865
|
[42] |
Danelljan M, Hager G, Shahbaz Khan F, et al. Learning spatially regularized correlation filters for visual tracking[C]//Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2015: 4310−4318
|
[43] |
Nam H, Han B. Learning multi-domain convolutional neural networks for visual tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 4293−4302
|
[44] |
Henriques J F, Caseiro R, Martins P, et al. High-speed tracking with kernelized correlation filters[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2014, 37(3): 583−596
|
[45] |
Danelljan M, Bhat G, Shahbaz Khan F, et al. ECO: Efficient convolution operators for tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 6638−6646
|
[46] |
Li Yang, Zhu Jianke. A scale adaptive kernel correlation filter tracker with feature integration [G] //LNCS 8926: Proc of the European Conf on Computer Vision. Berlin: Springer, 2014: 254−265
|
[47] |
Cao Ziang, Fu Changhong, Ye Junjie, et al. HiFT: Hierarchical feature transformer for aerial tracking[C]//Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2021: 15457−15466
|
[48] |
Zhang Jianming, Ma Shugao, Sclaroff S. MEEM: Robust tracking via multiple experts using entropy minimization[C]//Proc of the 13th European Conf on Computer Vision. Berlin: Springer, 2014: 188−203
|
[49] |
Zhang Zhipeng, Peng Houwen. Deeper and wider Siamese networks for real-time visual tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 4591−4600
|
[50] |
Chen Zedu, Zhong Bineng, Li Guorong, et al. Siamese box adaptive network for visual tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2020: 6668−6677
|
[51] |
Dong Xingping, Shen Jianbing, Shao Ling, et al. CLNet: A compact latent network for fast adjusting Siamese trackers[C]//Proc of the European Conf on Computer Vision. Berlin: Springer 2020: 378−395
|
[52] |
Yan Bin, Zhao Haojie, Wang Dong, et al. skimming-perusal tracking: A framework for real-time and robust long-term tracking[C]//Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2019: 2385−2393
|
[53] |
Guo Qing, Feng Wei, Zhou Ce, et al. Learning dynamic siamese network for visual object tracking[C]//Proc of the IEEE Int Conf on Computer Vision. Piscataway, NJ: IEEE, 2017: 1763−1771
|
[54] |
Dai Kenan, Wang Dong, Lu Huchuan, et al. Visual tracking via adaptive spatially-regularized correlation filters[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2019: 4670−4679
|
[55] |
Tao R, Gavves E, Smeulders A W M. Siamese instance search for tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 1420−1429
|
[56] |
Li Feng, Tian Cheng, Zuo Wangmeng, et al. Learning spatial-temporal regularized correlation filters for visual tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2018: 4904−4913
|
[57] |
Valmadre J, Bertinetto L, Henriques J, et al. End-to-end representation learning for correlation filter based tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2017: 2805−2813
|
[58] |
Bertinetto L, Valmadre J, Golodetz S, et al. Staple: Complementary learners for real-time tracking[C]//Proc of the IEEE Conf on Computer Vision and Pattern Recognition. Piscataway, NJ: IEEE, 2016: 1401−1409
|