ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2014, Vol. 51 ›› Issue (11): 2518-2527.doi: 10.7544/issn1000-1239.2014.20130869

Previous Articles     Next Articles

An Adaptive Grid-Density Based Data Stream Clustering Algorithm Based on Uncertainty Model

Liu Zhuo1, Yang Yue2, Zhang Jianpei2, Yang Jing2, Chu Yan2, Zhang Zebao2   

  1. 1(College of Automation, Harbin Engineering University, Harbin 150001); 2(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001)
  • Online:2014-11-01

Abstract: Uncertain data stream, a new widespread data form which is emerging in many application fields with the development of computer and sensing technology. The research of data analysis and processing of uncertain data stream has attracted the attention of many researchers. Existing data stream clustering techniques generally ignored uncertainty characteristics. It often makes the clustering results unreasonable even unavailable. The two aspects of uncertain character, existence-uncertainty and attributive-uncertainty, can affect the clustering process and results significantly. But they can’t be considered at same time in existing relevant work. The lately reported clustering algorithms are all based on K-Means algorithm with inherent shortage. In order to solve this problem, a data stream adaptive grid-density based algorithm, ADC-UStream, is proposed under the uncertainty of model. For the uncertainty characteristic, with the unified strategy of the presence and properties uncertainty, the algorithm builds the entropy uncertainty model to measure the uncertainty. With the comprehensive survey of uncertainty, the grid-density based clustering algorithm over attenuation window model is adopted to design the temporal and spatial adaptive density threshold, to adapt to the temporal and non-uniform distribution characteristics of the uncertainty data flow. The experimental results show that the ADC-UStream algorithm under the uncertainty model has good performance both in clustering quality and clustering efficiency.

Key words: uncertain character, data stream, clustering, grid-density, adaptive density threshold, uncertainty model

CLC Number: