基于并行约简的概念漂移探测

邓大勇; 徐小玉; 黄厚宽

doi:10.7544/issn1000-1239.2015.20140275

基于并行约简的概念漂移探测

Concept Drifting Detection for Categorical Evolving Data Based on Parallel Reducts

摘要

摘要: 数据流挖掘是当前数据挖掘研究的一个热点,概念漂移检测是数据流挖掘的一个重要研究方向.虽然有不少概念漂移的探测方法，但是它们都有一些共同的缺陷：没有整体上删除冗余属性以及利用外部属性去探测概念漂移(比如利用对外部数据的分类准确率)等.利用粗糙集和F-粗糙集的基本原理和基本方法，把数据流中的滑动窗口当成决策子表簇，提出了一种对数据流进行并行约简、整体删除冗余属性的方法，并运用并行约简后数据流决策子表簇中属性重要性的变化探测概念漂移现象.与传统的方法不同，新方法利用数据的内部特性对概念漂移进行探测.实验结果显示，该方法能够有效地整体删除冗余属性、探测概念漂移现象，并且基于互信息的属性重要性在概念漂移探测效果方面比基于正区域的属性重要性要好些.

Abstract: Data stream mining is one of the hot topics of data mining and concept drifting detection is one of its research directions. There have been many methods to detect concept drifting, but there are some drawbacks in current methods to detect concept drifting, such as no reducing redundant attributes integrally in sliding windows, and detecting concept drifting according to outer properties, etc. Based on the basic principles of rough sets and F-rough sets, the sliding windows in a data stream are regarded as decision subsystems, and the attribute significance of conditional attributes is used to detect concept drifting. This new method is divided into two steps: the redundant attributes in a streaming data are reduced through parallel reducts at first, then the concept drifting is detected according to the change of attribute significance. Different from other existing methods, the inner properties of data stream are used to detect concept drifting. Experiments show that this method is valid to reduce redundant attributes integrally and detect concept drifting, and that the attribute significance based on the mutual information is more effective than the attribute significance based on the positive region when they are used to detect concept drifting. For data stream mining, this paper provides a new method to detect concept drifting. For rough set theory, this paper offers a new application area.

HTML全文

参考文献(0)

施引文献

资源附件(0)