基于混合特征提取的流数据概念漂移处理方法

郭虎升; 刘艳杰; 王文剑

doi:10.7544/issn1000-1239.202330184

基于混合特征提取的流数据概念漂移处理方法

Concept Drift Processing Method of Streaming Data Based on Mixed Feature Extraction

摘要

摘要: 大数据时代，越来越多的数据以数据流的形式产生，由于其具有快速、无限、不稳定及动态变化等特性，使得概念漂移成为流数据挖掘中一个重要但困难的问题. 目前多数概念漂移处理方法存在信息提取能力有限且未充分考虑流数据的时序特性等问题. 针对这些问题，提出一种基于混合特征提取的流数据概念漂移处理方法（concept drift processing method of streaming data based on mixed feature extraction，MFECD）. 该方法首先采用不同尺度的卷积核对数据进行建模以构建拼接特征，采用门控机制将浅层输入和拼接特征融合，作为不同网络层次输入进行自适应集成，以获得能够兼顾细节信息和语义信息的数据特性. 在此基础上，采用注意力机制和相似度计算评估流数据不同时刻的重要性，以增强数据流关键位点的时序特性. 实验结果表明，该方法能有效提取流数据中包含的复杂数据特征和时序特征，提高了数据流中概念漂移的处理能力.

Abstract: In the era of big data, more and more data are generated in the form of data streams, which makes concept drift an important but difficult problem in streaming data mining due to its fast, infinite, unstable and dynamically changing characteristics. Most of the current concept drift processing methods have limited information extraction capability and do not fully consider the temporal features of streaming data. To address these problems, a concept drift processing method of streaming data based on mixed feature extraction (MFECD) is proposed. The method first uses convolutional kernels of different scales to model the data to construct splicing features, and uses a gating mechanism to fuse shallow inputs and splicing features for adaptive integration as different network level inputs to obtain data features that can take into account both detailed and semantic information. Based on this, attention mechanism and similarity calculation are used to evaluate the importance of stream data at different moments in order to enhance the temporal features of key site of the data stream. The experimental results show that our method can effectively extract the complex data features and temporal features contained in the streaming data, and improve the processing capability of concept drift in the data stream.

HTML全文

参考文献(58)

施引文献

资源附件(0)