ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (8): 1757-1767.doi: 10.7544/issn1000-1239.2015.20150247

所属专题: 2015面向大数据的人工智能技术

• 人工智能 • 上一篇    下一篇

基于通勤时间距离的流形聚类与可视化

邵超,张啸剑   

  1. (河南财经政法大学计算机与信息工程学院 郑州 450002)(sc_flying@163.com)
  • 出版日期: 2015-08-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61202285)

Manifold Clustering and Visualization with Commute Time Distance

Shao Chao, Zhang Xiaojian   

  1. (College of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450002)
  • Online: 2015-08-01

摘要: 现有流形学习算法能比较好地学习和可视化高维数据的低维非线性流形结构,但对难以高效选取的邻域大小参数还比较敏感,且要求数据良好采样于单一流形.为了降低流形学习算法对邻域大小参数的敏感程度,并实现对多流形数据的良好聚类与可视化,提出了1种新的基于通勤时间距离的流形学习算法——CTD-ISOMAP(commute time distance isometric mapping).和欧氏距离相比,通勤时间距离以概率的形式综合考虑了邻域图上2点间的所有连接路径,不但更加鲁棒,而且还能表达数据的内在几何结构.因此,CTD-ISOMAP算法采用通勤时间距离能比较好地识别并删除邻域图中可能存在的“短路”边以及不同流形之间的连接边,从而能在更大的邻域大小参数范围内实现对流形数据的良好可视化,并提高对多流形数据的聚类效果.最后的实验结果证实了该算法的有效性.

关键词: 流形学习, 等距映射, 聚类, 邻域大小, 通勤时间距离

Abstract: The existing manifold learning algorithms can effectively learn and visualize the low-dimensional nonlinear manifold structure of high-dimensional data. However, most efforts to date select the neighborhood size in sensitivity and difficulty, and require sampling the data from a single manifold. To reduce the sensitivity of manifold learning algorithms to the neighborhood size, and address the effective visualization and clustering of multi-manifold data, this paper employs the commute time distance to propose a novel manifold learning algorithm, called CTD-ISOMAP (commute time distance isometric mapping). Compared with Euclidean distance, commute time distance probabilistically synthesizes all the paths connecting any two points in the neighborhood graph. Consequently, it takes into account the intrinsic nonlinear geometric structure for the given data, while still providing the robust results, and then is suitable to identify the shortcut edges and the inter-manifold edges possibly existed in the neighborhood graph. CTD-ISOMAP with the commute time distance, therefore, effectively eliminates the shortcut edges in the neighborhood graph, so that each output achieves the low-dimensional nonlinear manifold structure in the much wider range of the neighborhood size, and eliminates the inter-manifold edges in the neighborhood graph to boost the clustering on multi-manifold data obtained by spectral clustering. Finally, our experimental study verifies the effectiveness of CTD-ISOMAP.

Key words: manifold learning, isometric mapping (ISOMAP), clustering, neighborhood size, commute time distance

中图分类号: