高级检索

    k-LDCHD——高维空间k邻域局部密度聚类算法

    k-LDCHD—A Local Density Based k-Neighborhood Clustering Algorithm for High Dime nsional Space

    • 摘要: 聚类是数据挖掘领域的一项重要课题,高维空间聚类以数据分布稀疏、噪声数据多、“差距趋零现象”而成为难点.在分析现有聚类算法不足的基础上,引入k邻域点集、k邻域半径等 概念,提出一种高维空间单参数k邻域局部密度聚类算法k-PCLDHD;为了提高算法的效率, 进一步定义了参考距离等概念,并采用“双参考数据点”对数据集中的数据对象进行预处理 ,以减少扫描数据集的开销,提出k-PCLDHD的优化算法k-LDCHD.理论分析和实验结果表明, 算法可以有效解决高维空间聚类问题,算法是有效可行的.

       

      Abstract: Clustering is an important research in data mining. Clustering in high dimension al space is especially difficult for the spatial distribution of the data, too m uch noise data points, and the phenomenon that the distance between the distance s to the nearest and farthest neighbors of a data point goes to zero. By analyzi ng limitations of the existing algorithms, definitions such as k-neighborhood se t and k-radius are introduced. A local density based k-neighborhood clustering a lgorithm k-PCLDHD is proposed to solve this problem. To improve the algorithm's efficiency, the optimized algorithm k-LDCHD is proposed. The definition of refer ence distance is applied to make a pretreatment to the data set, thus avoiding q uite a lot of scans to the data set after using double reference points, and the effectiveness is improved greatly. The theoretical analysis and experimental re sults indicate that the algorithm can solve the problem of clustering in high di mensional space. It's effective and efficient.

       

    /

    返回文章
    返回