Abstract:
Clustering is an important research in data mining. Clustering in high dimension al space is especially difficult for the spatial distribution of the data, too m uch noise data points, and the phenomenon that the distance between the distance s to the nearest and farthest neighbors of a data point goes to zero. By analyzi ng limitations of the existing algorithms, definitions such as k-neighborhood se t and k-radius are introduced. A local density based k-neighborhood clustering a lgorithm k-PCLDHD is proposed to solve this problem. To improve the algorithm's efficiency, the optimized algorithm k-LDCHD is proposed. The definition of refer ence distance is applied to make a pretreatment to the data set, thus avoiding q uite a lot of scans to the data set after using double reference points, and the effectiveness is improved greatly. The theoretical analysis and experimental re sults indicate that the algorithm can solve the problem of clustering in high di mensional space. It's effective and efficient.