Abstract:
Clustering by fast search and find of density peaks (density peaks clustering algorithm, DPC) is a new clustering analysis algorithm proposed in 2014. It draws decision graphs according to the theory that the cluster centers have larger local density and the distance between larger density points and cluster centers is far away. And then it finds the density peaks, as to obtain any shapes of the clusters. However, as the computation of local density and distance relies on the similarity matrix in the process of finding cluster centers, the computational complexity is relatively higher, which limits the application of DPC in large-scale datasets. In this paper, a density peaks clustering algorithm based on grid screening (SDPC) is proposed. SDPC removes the points with sparse local density based on the grid method according to the uneven distribution of datasets firstly. And then the cluster centers are selected by drawing the decision graph based on DPC, which reduces the computational complexity effectively on the basis of ensuring the clustering accuracy. Theoretical analysis and experimental results show that DPC based on grid screening can not only cluster large-scale datasets correctly, but also reduce the time complexity greatly.