基于在线集成的概念漂移自适应分类方法

郭虎升; 丛璐; 高淑花; 王文剑

doi:10.7544/issn1000-1239.202220245

基于在线集成的概念漂移自适应分类方法

Adaptive Classification Method for Concept Drift Based on Online Ensemble

摘要

摘要: 针对流数据中概念漂移发生后，在线学习模型不能对分布变化后的数据做出及时响应且难以提取数据分布的最新信息，导致学习模型收敛较慢的问题，提出一种基于在线集成的概念漂移自适应分类方法（adaptive classification method for concept drift based on online ensemble，AC_OE）. 一方面，该方法利用在线集成策略构建在线集成学习器，对数据块中的训练样本进行局部预测以动态调整学习器权重，有助于深入提取漂移位点附近流数据的演化信息，对数据分布变化进行精准响应，提升在线学习模型对概念漂移发生后新数据分布的适应能力，提高学习模型的实时泛化性能；另一方面，利用增量学习策略构建增量学习器，并随新样本的进入进行增量式的训练更新，提取流数据的全局分布信息，使模型在平稳的流数据状态下保持较好的鲁棒性. 实验结果表明，该方法能够对概念漂移做出及时响应并加速在线学习模型的收敛速度，同时有效提高学习器的整体泛化性能.

Abstract: In view of the problems that the online learning model cannot respond in time to the change of data distribution and it is difficult to extract the latest information of data distribution after concept drift occurs in streaming data, which leads to slow convergence of the learning model, an adaptive classification method for concept drift based on online ensemble (AC_OE) is presented. On the one hand, the online ensemble strategy is used to construct a local online learner, which can dynamically adjust the weight of base learner by local prediction of training samples in data blocks. It is helpful to not only extract the evolution information of streaming data in depth to make a more accurate response to the change of data distribution, but also improve the adaptability of the online learning model to the new data distribution after the occurrence of concept drift, and the real-time generalization performance of the learning model is improved too. On the other hand, the incremental learning strategy is used to construct a global incremental learner, and incremental training updates are carried out with the entry of new samples. The method extracts global distribution information of streaming data, and the model can maintain good robustness in the steady state of streaming data. Experimental results show that the proposed method can respond to concept drift and accelerate the convergence of online learning model, and improve the overall generalization performance of the learner effectively.

HTML全文

参考文献(28)

施引文献

资源附件(0)