G-mean Weighted Classification Method for Imbalanced Data Stream with Concept Drift
-
摘要: 数据流中的概念漂移和类别不平衡问题会严重影响数据流分类算法的性能和稳定性.针对二分类数据流中概念漂移和类别不平衡的问题,在基于数据块的集成分类方法上引入成员分类器权重的在线更新机制,结合重采样和自适应滑动窗口技术,提出了一种基于G-mean加权的不平衡数据流在线分类方法(online G-mean update ensemble for imbalance learning, OGUEIL).该方法基于集成学习框架,利用时间衰减因子增量计算成员分类器最近若干实例上的G-mean性能,并确定成员分类器权重,每到达一个新实例,在线更新所有成员分类器及其权重,并对少类实例进行随机过采样.同时,OGUEIL会周期性地根据当前数据构造类别平衡数据集训练新的候选分类器,并选择性地添加至集成框架中.在真实和人工数据集上的结果表明,所提方法的综合性能优于其他同类方法.Abstract: Concept drift and class imbalance in data stream seriously degrade the performance and stability of the traditional data stream classification algorithms. To solve this issue in binary classification of data stream, an online G-mean weighted ensemble classification method for imbalanced data stream with concept drift termed OGUEIL is proposed. It exploits the online update mechanism of component classifiers’ weights to modify block-based ensemble algorithms, combining the hybrid resampling and adaptive sliding window algorithm. OGUEIL is based on the ensemble learning framework that once a new instance reaches, each component classifier in the ensemble and its weight are correspondingly updated online, and the minority class instance is randomly oversampled at the same time. Particularly, each component classifier determines its weight according to the G-mean performance on several recently incoming instances, where G-mean of each component classifier is calculated based on the time decay factor increment. At the same time, OGUEIL periodically constructs a balanced dataset according to the data in the current sliding window and trains a new candidate classifier, then adds it to the ensemble based on specific conditions. The experimental results on both real-world and synthesized datasets show that the comprehensive performance of the proposed method outperforms other baseline algorithms.
-
Keywords:
- data stream /
- concept drift /
- ensemble learning /
- class imbalance /
- classification
-
-
期刊类型引用(8)
1. 朱诗能,韩萌,杨书蓉,代震龙,杨文艳,丁剑. 不平衡数据流的集成分类方法综述. 计算机工程与应用. 2025(02): 59-72 . 百度学术
2. 江军,于化龙. 一种面向不平衡数据流的动态加权集成学习算法. 电子设计工程. 2025(08): 17-21 . 百度学术
3. 蔡博,张海清,李代伟,向筱铭,于曦,邓钧予. 基于增量加权的不平衡漂移数据流分类算法. 计算机应用研究. 2024(03): 854-860 . 百度学术
4. 郭虎升,刘艳杰,王文剑. 基于混合特征提取的流数据概念漂移处理方法. 计算机研究与发展. 2024(06): 1497-1510 . 本站查看
5. 王婧,郭虎升,王文剑. 基于弱监督集成的概念演化自适应检测方法. 吉林大学学报(信息科学版). 2024(03): 406-420 . 百度学术
6. 郭虎升,张洋,王文剑. 面向不同类型概念漂移的两阶段自适应集成学习方法. 计算机研究与发展. 2024(07): 1799-1811 . 本站查看
7. 马乾骏,郭虎升,王文剑. 在线深度神经网络的弱监督概念漂移检测方法. 小型微型计算机系统. 2024(09): 2094-2101 . 百度学术
8. 穆栋梁,韩萌,李昂,刘淑娟,高智慧. 概念漂移复杂数据流分类方法综述. 计算机应用. 2023(06): 1664-1675 . 百度学术
其他类型引用(4)
计量
- 文章访问数: 115
- HTML全文浏览量: 2
- PDF下载量: 73
- 被引次数: 12