高级检索

    基于通勤时间距离的流形聚类与可视化

    Manifold Clustering and Visualization with Commute Time Distance

    • 摘要: 现有流形学习算法能比较好地学习和可视化高维数据的低维非线性流形结构,但对难以高效选取的邻域大小参数还比较敏感,且要求数据良好采样于单一流形.为了降低流形学习算法对邻域大小参数的敏感程度,并实现对多流形数据的良好聚类与可视化,提出了1种新的基于通勤时间距离的流形学习算法——CTD-ISOMAP(commute time distance isometric mapping).和欧氏距离相比,通勤时间距离以概率的形式综合考虑了邻域图上2点间的所有连接路径,不但更加鲁棒,而且还能表达数据的内在几何结构.因此,CTD-ISOMAP算法采用通勤时间距离能比较好地识别并删除邻域图中可能存在的“短路”边以及不同流形之间的连接边,从而能在更大的邻域大小参数范围内实现对流形数据的良好可视化,并提高对多流形数据的聚类效果.最后的实验结果证实了该算法的有效性.

       

      Abstract: The existing manifold learning algorithms can effectively learn and visualize the low-dimensional nonlinear manifold structure of high-dimensional data. However, most efforts to date select the neighborhood size in sensitivity and difficulty, and require sampling the data from a single manifold. To reduce the sensitivity of manifold learning algorithms to the neighborhood size, and address the effective visualization and clustering of multi-manifold data, this paper employs the commute time distance to propose a novel manifold learning algorithm, called CTD-ISOMAP (commute time distance isometric mapping). Compared with Euclidean distance, commute time distance probabilistically synthesizes all the paths connecting any two points in the neighborhood graph. Consequently, it takes into account the intrinsic nonlinear geometric structure for the given data, while still providing the robust results, and then is suitable to identify the shortcut edges and the inter-manifold edges possibly existed in the neighborhood graph. CTD-ISOMAP with the commute time distance, therefore, effectively eliminates the shortcut edges in the neighborhood graph, so that each output achieves the low-dimensional nonlinear manifold structure in the much wider range of the neighborhood size, and eliminates the inter-manifold edges in the neighborhood graph to boost the clustering on multi-manifold data obtained by spectral clustering. Finally, our experimental study verifies the effectiveness of CTD-ISOMAP.

       

    /

    返回文章
    返回