加权的自适应相似度度量

肖  宇  于  剑

加权的自适应相似度度量

肖宇于剑

A Weighted Self Adaptive Similarity Measure

Xiao Yu and Yu Jian

摘要

摘要: 聚类分析是数据挖掘中一种非常重要的技术.聚类算法中的关键问题是相异度或相似度的度量，聚类结果直接依赖于相异度或相似度度量，尤其对于谱聚类方法更是如此.谱聚类算法是近期兴起的一种基于相似度矩阵的聚类算法.相比于传统的划分型聚类算法，谱聚类算法不受限于球状聚类簇，能够发现不规则形状的聚类簇.在已有的谱聚类算法中，高斯核相似度是最常用的相似度度量准则.基于高斯核相似度度量及其扩展形式，提出了一种加权的自适应的相似度度量,此相似度可以用于谱聚类以及其他基于相似度矩阵的聚类算法.新的相似度度量不仅能够描述多密度聚类簇中数据点间的相似度，而且可以降低离群点(噪声点)与其他数据点间的相似度.实验结果显示新的相似度度量可以更好地描述不同类型的数据集中数据点间的相似度，进而得到更好的聚类结果.

Abstract: Cluster analysis is one of the important techniques in data mining. One of the key problems for clustering algorithm is the dissimilarity measure or similarity measure, and the clustering results are directly dependent on the dissimilarity measure or similarity measure, especially for the clustering algorithms based on similarity matrix, such as spectral clustering. Spectral clustering is a recently developed clustering algorithm. Compared with the traditional partitioning clustering algorithms, spectral clustering algorithm is not limited to spherical clusters, which can successfully discover irregular shape clusters. Gaussian kernel is most commonly used as the similarity measure for most of spectral clustering methods in the literature. In this paper, based on Gaussian kernel similarity measure and the modified Gaussian kernel similarity measures, we propose a weighted self adaptive similarity measure. The proposed similarity measure not only can describe the similarity for data sets with different densities clusters, but also can reduce the similarities between outliers (noise) and other data points. Experimental results show that the proposed similarity measure gives better description of the similarities between data points in various types of data sets, leading to better clustering results.

HTML全文

参考文献(0)

施引文献

资源附件(0)