面向概念漂移数据流的自适应分类算法

蔡桓; 陆克中; 伍启荣; 吴定明

doi:10.7544/issn1000-1239.20201017

面向概念漂移数据流的自适应分类算法

Adaptive Classification Algorithm for Concept Drift Data Stream

摘要

摘要: 数据流分类是数据挖掘中最重要的任务之一，而数据流的概念漂移特性给分类算法带来了巨大的挑战.基于极限学习机算法进行优化是解决数据流分类问题的一个热门方向，但目前大多数算法都采用提前指定模型参数的方式进行学习，这种做法使得分类模型只能在特定的数据集上才能发挥较好的性能.针对这一问题，提出了一种简单有效的处理概念漂移的算法——自适应在线顺序极限学习机分类算法.算法通过引入自适应模型复杂度机制，从而具有更好的分类性能.然后通过引入自适应遗忘因子与概念漂移检测机制，能够根据动态变化的数据流进行自适应学习，从而可以更好地适应概念漂移.进一步还引入异常点检测机制，避免分类决策边界被异常点破坏.仿真实验表明，所提出算法比同类算法具有更好的稳定性、分类准确性以及概念漂移适应能力.此外，还通过消融实验证实了算法所引入3个机制的有效性.

Abstract: Data stream classification is one of the most important tasks in data mining. The performance of a model classifier degrades due to concept drift even in stationary data; dealing with this problem hence becomes more challenging in data streams. The extreme learning machine is widely used in data stream classification. However, the parameters of the extreme learning machine have to be determined in advance. It is not applicable for data stream classification since the fixed parameters cannot adapt a change in the concept or distribution of dataset over the time. To tackle this problem, this paper proposes an adaptive online sequential extreme learning machine algorithm. It outperforms the existing approaches in terms of classification results and adaptability of concept drift. It has an adjustable mechanism for model complexity so that the performance of the classification is improved. The proposed extreme learning machine is robust for the concept drift via adaptive learning based on a forgetting factor and the concept drift detection. In addition, the proposed algorithm is able to detect anomalies to prevent classification decision boundaries from being ruined. Extensive experiments demonstrate that the proposed approach outperforms competitors in terms of stability, classification accuracy, and adaptive ability. Moverover, the effectiveness of the proposed mechanisms has been approved via ablation experiments.

HTML全文

参考文献(0)

施引文献

资源附件(0)