ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (2): 459-466.doi: 10.7544/issn1000-1239.2016.20148284

Previous Articles     Next Articles

Large-Scale Heterogeneous Data Co-Clustering Based on Nonnegative Matrix Factorization

Shen Guowei, Yang Wu, Wang Wei , Yu Miao, Dong Guozhong   

  1. (Research Center of Information Security, Harbin Engineering University, Harbin 150001)
  • Online:2016-02-01

Abstract: Heterogeneous information network contains multi-typed entities and interactive relations. Some co-clustering algorithms have been proposed to mine underlying structure of different entities. However, with the increase of data scale, the scale of different class entities are growing unbalanced, and heterogeneous relational data are becoming extremely sparse. In order to solve this problem, we propose a two steps co-clustering algorithm FNMTF-CM based on correlation matrix decomposition. In the first step, the correlation matrix is built with the correlation relationship of smaller-typed entities and decomposed into indicating matrix of smaller-typed entity based on symmetric nonnegative matrix factorization. Correlation matrix has higher dense degree and smaller size compared with the original heterogeneous relationship matrix, so our algorithm can process large-scale heterogeneous data and maintain a high precision. After that, the indicating matrix of smaller-typed can be used as the input directly, so the heterogeneous relational matrix tri-factorization is very fast. Experiments on artificial and real-world heterogeneous data sets show that the accuracy and performance of FNMTF-CM algorithm are superior to the traditional co-clustering algorithms based on nonnegative matrix factorization.

Key words: heterogeneous network, co-clustering, nonnegative matrix factorization, large-scale data, correlation matrix

CLC Number: