Abstract:
Clustering analysis in data stream has become a hot research issue. In this paper, CAStream, a novel algorithm of clustering and evolution analysis over high dimensional data stream is presented, which is based on subspace. CAStream partitions the data space into grids, gets the grid summary statistics using approximate method, then stores snapshots of potential dense girds by improved pyramid time frame, and finally finds the clusters and analyzes the cluster evolution by the depth-first search algorithm. CAStream can deal with high dimensional data stream, and discover the clusters with arbitrary shape. The experimental results on real datasets and synthetic datasets demonstrate the promising availabilities of the approach.