ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (8): 1757-1767.doi: 10.7544/issn1000-1239.2015.20150247

Special Issue: 2015面向大数据的人工智能技术

Previous Articles     Next Articles

Manifold Clustering and Visualization with Commute Time Distance

Shao Chao, Zhang Xiaojian   

  1. (College of Computer and Information Engineering, Henan University of Economics and Law, Zhengzhou 450002)
  • Online:2015-08-01

Abstract: The existing manifold learning algorithms can effectively learn and visualize the low-dimensional nonlinear manifold structure of high-dimensional data. However, most efforts to date select the neighborhood size in sensitivity and difficulty, and require sampling the data from a single manifold. To reduce the sensitivity of manifold learning algorithms to the neighborhood size, and address the effective visualization and clustering of multi-manifold data, this paper employs the commute time distance to propose a novel manifold learning algorithm, called CTD-ISOMAP (commute time distance isometric mapping). Compared with Euclidean distance, commute time distance probabilistically synthesizes all the paths connecting any two points in the neighborhood graph. Consequently, it takes into account the intrinsic nonlinear geometric structure for the given data, while still providing the robust results, and then is suitable to identify the shortcut edges and the inter-manifold edges possibly existed in the neighborhood graph. CTD-ISOMAP with the commute time distance, therefore, effectively eliminates the shortcut edges in the neighborhood graph, so that each output achieves the low-dimensional nonlinear manifold structure in the much wider range of the neighborhood size, and eliminates the inter-manifold edges in the neighborhood graph to boost the clustering on multi-manifold data obtained by spectral clustering. Finally, our experimental study verifies the effectiveness of CTD-ISOMAP.

Key words: manifold learning, isometric mapping (ISOMAP), clustering, neighborhood size, commute time distance

CLC Number: