Abstract:
The existing manifold learning algorithms can effectively learn and visualize the low-dimensional nonlinear manifold structure of high-dimensional data. However, most efforts to date select the neighborhood size in sensitivity and difficulty, and require sampling the data from a single manifold. To reduce the sensitivity of manifold learning algorithms to the neighborhood size, and address the effective visualization and clustering of multi-manifold data, this paper employs the commute time distance to propose a novel manifold learning algorithm, called CTD-ISOMAP (commute time distance isometric mapping). Compared with Euclidean distance, commute time distance probabilistically synthesizes all the paths connecting any two points in the neighborhood graph. Consequently, it takes into account the intrinsic nonlinear geometric structure for the given data, while still providing the robust results, and then is suitable to identify the shortcut edges and the inter-manifold edges possibly existed in the neighborhood graph. CTD-ISOMAP with the commute time distance, therefore, effectively eliminates the shortcut edges in the neighborhood graph, so that each output achieves the low-dimensional nonlinear manifold structure in the much wider range of the neighborhood size, and eliminates the inter-manifold edges in the neighborhood graph to boost the clustering on multi-manifold data obtained by spectral clustering. Finally, our experimental study verifies the effectiveness of CTD-ISOMAP.