ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2014, Vol. 51 ›› Issue (10): 2277-2294.doi: 10.7544/issn1000-1239.2014.20130718

Previous Articles     Next Articles

MMCKDE: m-Mixed Clustering Kernel Density Estimation over Data Streams

Xu Min1,2, Deng Zhaohong1, Wang Shitong1, Shi Yingzhong1,2   

  1. 1(School of Digital Media, Jiangnan University, Wuxi, Jiangsu 214122); 2(School of Internet of Things Technology, Wuxi Institute of Technology, Wuxi, Jiangsu 214121)
  • Online:2014-10-01

Abstract: In many data stream mining applications, traditional density estimation methods such as kernel density estimation and reduced set density estimation can not apply to the data stream density estimation because of their high computational burden and big storage space. In order to reduce the time and space complexities, a novel online data stream density estimation method by m-mixed clustering kernel is proposed. In the proposed method, MMCKDE nodes are created using a fixed number of mixed clustering kernels to get cluster information instead of all kernels obtained from other density estimation method. In order to further reduce the storage space, MMCKDE nodes can be merged by calculating KL divergence. Finally, the probability density functions over arbitrary time or the entire time can be estimated by the obtained model. We compared the MMCKDE algorithm with the SOMKE algorithm in terms of density estimation accuracy and running time for various stationary data sets. We also investigated the use of MMCKDE over evolving data streams. The experimental results illustrate the effectiveness and efficiency of the proposed method.

Key words: m-mixed clustering kernel, kernel density estimation, probability density functions, Kullback Leibler (KL) divergence, streaming data mining

CLC Number: