G-mean Weighted Classification Method for Imbalanced Data Stream with Concept Drift
-
摘要: 数据流中的概念漂移和类别不平衡问题会严重影响数据流分类算法的性能和稳定性.针对二分类数据流中概念漂移和类别不平衡的问题,在基于数据块的集成分类方法上引入成员分类器权重的在线更新机制,结合重采样和自适应滑动窗口技术,提出了一种基于G-mean加权的不平衡数据流在线分类方法(online G-mean update ensemble for imbalance learning, OGUEIL).该方法基于集成学习框架,利用时间衰减因子增量计算成员分类器最近若干实例上的G-mean性能,并确定成员分类器权重,每到达一个新实例,在线更新所有成员分类器及其权重,并对少类实例进行随机过采样.同时,OGUEIL会周期性地根据当前数据构造类别平衡数据集训练新的候选分类器,并选择性地添加至集成框架中.在真实和人工数据集上的结果表明,所提方法的综合性能优于其他同类方法.Abstract: Concept drift and class imbalance in data stream seriously degrade the performance and stability of the traditional data stream classification algorithms. To solve this issue in binary classification of data stream, an online G-mean weighted ensemble classification method for imbalanced data stream with concept drift termed OGUEIL is proposed. It exploits the online update mechanism of component classifiers’ weights to modify block-based ensemble algorithms, combining the hybrid resampling and adaptive sliding window algorithm. OGUEIL is based on the ensemble learning framework that once a new instance reaches, each component classifier in the ensemble and its weight are correspondingly updated online, and the minority class instance is randomly oversampled at the same time. Particularly, each component classifier determines its weight according to the G-mean performance on several recently incoming instances, where G-mean of each component classifier is calculated based on the time decay factor increment. At the same time, OGUEIL periodically constructs a balanced dataset according to the data in the current sliding window and trains a new candidate classifier, then adds it to the ensemble based on specific conditions. The experimental results on both real-world and synthesized datasets show that the comprehensive performance of the proposed method outperforms other baseline algorithms.
-
Keywords:
- data stream /
- concept drift /
- ensemble learning /
- class imbalance /
- classification
-
-
期刊类型引用(9)
1. 杨秀璋,彭国军,刘思德,田杨,李晨光,傅建明. 面向APT攻击的溯源和推理研究综述. 软件学报. 2025(01): 203-252 . 百度学术
2. 马涛,杨峰,刘霞. 物联网技术在降低成本提高效率中的应用. 电子技术. 2024(01): 282-283 . 百度学术
3. 万丽娟,笪枫. 电力监控系统的多源威胁情报分析. 电子技术. 2024(03): 248-249 . 百度学术
4. 张进军,周锐. 基于多源数据分析的物联网智能跨层资源分配算法. 安徽电气工程职业技术学院学报. 2024(02): 73-81 . 百度学术
5. 蒋伟进,李恩,罗田甜,周文颖,杨莹. 基于区块链和可信执行环境的细粒度访问控制方案研究与应用——以物联网为例. 系统工程理论与实践. 2024(07): 2394-2410 . 百度学术
6. 陈泽恩. 物联网中多源异构数据安全漏洞检测技术研究. 物联网技术. 2024(09): 124-126 . 百度学术
7. 武丹丹,陈捷,谢瑞云,陈轲. OntoCSD:基于本体的网络空间防御综合解决方案安全模型(英文). Frontiers of Information Technology & Electronic Engineering. 2024(09): 1209-1226 . 百度学术
8. 刘奇旭,刘嘉熹,靳泽,刘心宇,肖聚鑫,陈艳辉,朱洪文,谭耀康. 基于人工智能的物联网恶意代码检测综述. 计算机研究与发展. 2023(10): 2234-2254 . 本站查看
9. 杜文玲. 基于多源数据整合的大学生多级别心理压力智能预测方法. 赤峰学院学报(自然科学版). 2023(09): 74-77 . 百度学术
其他类型引用(9)
计量
- 文章访问数: 115
- HTML全文浏览量: 2
- PDF下载量: 73
- 被引次数: 18