ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (5): 1071-1079.doi: 10.7544/issn1000-1239.2015.20140275

• 人工智能 • 上一篇    下一篇

基于并行约简的概念漂移探测

邓大勇1,徐小玉1,黄厚宽2   

  1. 1(浙江师范大学数理与信息工程学院 浙江金华 321004); 2(北京交通大学计算机与信息技术学院 北京 100044) (dayongd@163.com)
  • 出版日期: 2015-05-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61473030);浙江省自然科学基金项目(Y15F020044);浙江省自然科学青年基金项目(Q13F020006);浙江师范大学计算机软件与理论省级重中之重学科开放基金项目(ZSDZZZZXK27)

Concept Drifting Detection for Categorical Evolving Data Based on Parallel Reducts

Deng Dayong1, Xu Xiaoyu1, Huang Houkuan2   

  1. 1(College of Mathematics, Physics and Information Engineering, Zhejiang Normal University, Jinhua, Zhejiang 321004); 2(School of Computer and Information Technology, Beijing Jiaotong University, Beijing 100044)
  • Online: 2015-05-01

摘要: 数据流挖掘是当前数据挖掘研究的一个热点,概念漂移检测是数据流挖掘的一个重要研究方向.虽然有不少概念漂移的探测方法,但是它们都有一些共同的缺陷:没有整体上删除冗余属性以及利用外部属性去探测概念漂移(比如利用对外部数据的分类准确率)等.利用粗糙集和F-粗糙集的基本原理和基本方法,把数据流中的滑动窗口当成决策子表簇,提出了一种对数据流进行并行约简、整体删除冗余属性的方法,并运用并行约简后数据流决策子表簇中属性重要性的变化探测概念漂移现象.与传统的方法不同,新方法利用数据的内部特性对概念漂移进行探测.实验结果显示,该方法能够有效地整体删除冗余属性、探测概念漂移现象,并且基于互信息的属性重要性在概念漂移探测效果方面比基于正区域的属性重要性要好些.

关键词: 数据流, 概念漂移, 粗糙集, F-粗糙集, 并行约简

Abstract: Data stream mining is one of the hot topics of data mining and concept drifting detection is one of its research directions. There have been many methods to detect concept drifting, but there are some drawbacks in current methods to detect concept drifting, such as no reducing redundant attributes integrally in sliding windows, and detecting concept drifting according to outer properties, etc. Based on the basic principles of rough sets and F-rough sets, the sliding windows in a data stream are regarded as decision subsystems, and the attribute significance of conditional attributes is used to detect concept drifting. This new method is divided into two steps: the redundant attributes in a streaming data are reduced through parallel reducts at first, then the concept drifting is detected according to the change of attribute significance. Different from other existing methods, the inner properties of data stream are used to detect concept drifting. Experiments show that this method is valid to reduce redundant attributes integrally and detect concept drifting, and that the attribute significance based on the mutual information is more effective than the attribute significance based on the positive region when they are used to detect concept drifting. For data stream mining, this paper provides a new method to detect concept drifting. For rough set theory, this paper offers a new application area.

Key words: data streams, concept drift, rough sets, F-rough sets, parallel reducts

中图分类号: