ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2018, Vol. 55 ›› Issue (11): 2419-2429.doi: 10.7544/issn1000-1239.2018.20170227

Previous Articles     Next Articles

Large-Scale Density Peaks Clustering Algorithm Based on Grid Screening

Xu Xiao, Ding Shifei, Sun Tongfeng, Liao Hongmei   

  1. (中国矿业大学计算机科学与技术学院 江苏徐州 221116) (xu_xiao@cumt.edu.cn)
  • Online:2018-11-01

Abstract: Clustering by fast search and find of density peaks (density peaks clustering algorithm, DPC) is a new clustering analysis algorithm proposed in 2014. It draws decision graphs according to the theory that the cluster centers have larger local density and the distance between larger density points and cluster centers is far away. And then it finds the density peaks, as to obtain any shapes of the clusters. However, as the computation of local density and distance relies on the similarity matrix in the process of finding cluster centers, the computational complexity is relatively higher, which limits the application of DPC in large-scale datasets. In this paper, a density peaks clustering algorithm based on grid screening (SDPC) is proposed. SDPC removes the points with sparse local density based on the grid method according to the uneven distribution of datasets firstly. And then the cluster centers are selected by drawing the decision graph based on DPC, which reduces the computational complexity effectively on the basis of ensuring the clustering accuracy. Theoretical analysis and experimental results show that DPC based on grid screening can not only cluster large-scale datasets correctly, but also reduce the time complexity greatly.

Key words: density peaks clustering algorithm, grid screening, decision graph, computational complexity, large-scale datasets

CLC Number: