ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (1): 200-210.doi: 10.7544/issn1000-1239.2015.20130493

Previous Articles     Next Articles

A Hierarchical Co-Clustering Algorithm for High-Order Heterogeneous Data

Yang Xinxin, Huang Shaobin   

  1. (College of Computer Science and Technology, Harbin Engineering University, Harbin 150001)
  • Online:2015-01-01

Abstract: The availability of high-order heterogeneous data represented with multiple features coming from heterogeneous domains is getting more and more common in real world application. High-order co-clustering algorithms can fuse multiple feature space information to improve clustering results effectivity, so recently it is becoming one of the hottest research topics. Most existing high-order co-clustering algorithms are non-hierarchical clustering algorithms. However, there are always hierarchical cluster structures hidden in high-order heterogeneous data. In order to mine the hidden patterns in datasets more effectively, we develop a high-order hierarchical co-clustering algorithm (HHCC). Goodman-Kruskal τ is used to measure the association of objects and features, which is an index measuring association of categorical variables. The objects which are strong association are partitioned into the same objects clusters, and simutaneously the features which are strong association are partitioned into the same features clusters too. HHCC algorithm uses Goodman-Kruskal τ to quantify the quality of clustering results of objects and features of every level. According to optimizing Goodman-Kruskal τ by a locally search approach, the number of clusters is automatically determined and clustering results of every hierarchy are obtained. The top-down strategy is adopted and a tree-like cluster structure is formed at last. Experimental results demonstrate that HHCC algorithm outperforms four classical homogeneous hierarchical algorithms and five previous high-order co-clustering algorithms.

Key words: high-order heterogeneous data, co-clustering, hierarchical clustering, measure of association, multiple feature space

CLC Number: