Abstract:
Clustering is an important analytical tool in data mining. Density-based clustering analysis is a clustering analysis method which is demanded to deal with very large databases. By analyzing the limitation of the existing density-based clustering algorithms and the problems of disposing various densities of data and illegibility of clusters boundaries, definitions such as projection points, neighborhood balance, balanceable core points, and boundary sparse points are introduced. After analyzing the distribution characters of core points and points in their neighborhood, a density based clustering algorithm bDBSCAN concerning the neighborhood balance of core points is proposed to improve DBSCAN. The algorithm deals with the core points by getting the projection of the points in their neighborhood to judge whether they are balanceable. Only balanceable core points can be expanded to form clusters. The algorithm can discover clusters with arbitrary shape and various data distribution characters effectively and efficiently and eliminate noise such as boundary sparse points. The theoretical analysis and experimental results indicate that the algorithm improves the accuracy of clustering and offers better results of clustering on various data sets and solves the difficulties of clustering high dimensional spatial data such as indistinct boundary between clusters, too many noise data points, etc. Meanwhile the choice and impact of the parameter in the algorithm are discussed.