ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (7): 1487-1498.doi: 10.7544/issn1000-1239.2015.20140182

• 人工智能 • 上一篇    下一篇

针对动态非平衡数据集鲁棒的在线极端学习机

张晶,冯林   

  1. (大连理工大学电子信息与电气工程学部计算机科学与技术学院 辽宁大连 116024) (大连理工大学创新实验学院 辽宁大连 116024) (zhangjing_0412@163.com)
  • 出版日期: 2015-07-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61173163,51105052,61370200)

An Algorithm of Robust Online Extreme Learning Machine for Dynamic Imbalanced Datasets

Zhang Jing, Feng Lin   

  1. (School of Computer Science and Technology, Faculty of Electronic Information and Electrical Engineering, Dalian University of Technology, Dalian, Liaoning 116024) (School of Innovation Experiment, Dalian University of Technology, Dalian, Liaoning 116024)
  • Online: 2015-07-01

摘要: 动态数据存在数据量动态改变,数据类别分布非平衡、不稳定等问题,这些问题成为分类的难点.针对该问题,通过对在线极端学习机模型进行拓展,提出鲁棒的权值在线极端学习机算法.为解决动态数据非平衡性,该算法借助代价敏感学习理论生成局部动态权值矩阵,从而优化分类模型产生的经验风险.同时,算法进一步考虑动态数据由于时序性质改变造成的数据分布变化,而引入遗忘因子增强分类器对数据分布变更的敏感性.算法在不同数据分布的24个非平衡动态数据集上测试,取得了较好的效果.

关键词: 非平衡数据集, 极端学习机, 在线极端学习机, 代价敏感学习, 遗忘因子

Abstract: With the coming of big data age, dynamic data has gradually appeared in various application fields, such as safety monitoring, financial forecasting, and medical diagnostics. Although existing knowledge discovery and data mining techniques have shown great success in many real-world applications, dynamic data has the features of imbalance and instability of data classes, the dynamic change of data volume, which makes it difficult for the classification of dynamic data. To solve these problems, in this paper a robust weighed online sequential extreme learning machine algorithm (RWOSELM) based on the online sequential extreme learning machine algorithm (OSELM) is presented. RWOSELM generates the local dynamic weighted matrix with the help of cost sensitive learning theory, thereby it optimizes the empirical risk of the classification model. Meanwhile, RWOSELM takes the data distribution changes which are caused by temporal properties change of dynamic data into consideration, thus it introduces the forgetting factor to enhance the sensitivity of the classifier to the change of data distribution. The method is able to deal with the data with imbalanced class distribution, and maintains the good robust on dynamic data. This paper tests on 24 datasets with different distribution, and the results show that RWOSELM gets good results on imbalanced dynamic dataset.

Key words: imbalanced dataset, extreme learning machine, online sequential extreme learning machine, cost sensitive learning, forgetting factor

中图分类号: