A Dynamic Data Stream Clustering Algorithm Based on Probability and Exemplar
-
Graphical Abstract
-
Abstract
We propose an efficient probability drifting dynamic α-expansion clustering algorithm, which is designed for data stream clustering problem. In this paper, we first develop a unified target function of both affinity propagation (AP) and enhanced α-expansion move (EEM) clustering algorithms, namely the probability exemplar-based clustering algorithm. Then a probability drifting dynamic α-expansion (PDDE) clustering algorithm has been proposed considering the probability framework. The framework is capable of dealing with data stream clustering problem when current data points are similar with pervious data points. In the process of clustering, the proposed algorithm ensures that the clustering result of current data points is at least comparable well with that of previous data points. What’s more, the proposed algorithm is capable of dealing with two kinds of similarities between current and previous data points, that is whether current data points share some points with previous data points or not. Besides, experiments based on both synthetic (D31, Birch 3) and real-world dataset (Forest Covertype, KDD CUP99) have indicated the capability of PDDE in clustering data streams. The advantage of the proposed clustering algorithm in contrast to both AP and EEM algorithms has been shown as well.
-
-