ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (7): 1370-1382.doi: 10.7544/issn1000-1239.2019.20180470

Previous Articles     Next Articles

Cross-Media Clustering by Share and Private Information Maximization

Yan Xiaoqiang, Ye Yangdong   

  1. (School of Information Engineering, Zhengzhou University, Zhengzhou 450001)
  • Online:2019-07-01

Abstract: Recently, the rapid emergence of cross media data with typical multi-source and heterogeneous characteristic brings great challenges to the traditional data analysis approaches. However, the most of existing approaches for cross media data heavily rely on the shared latent feature space to construct the relationships between multiple modalities, while ignoring the private information hidden in each modality. Aiming at this problem, this paper proposes a novel share and private information maximization (SPIM) algorithm for cross media data clustering, which leverages the shared and private information into the clustering process. Firstly, we present two shared information construction models: 1) Hybrid words (H-words) model. In this model, the low-level features in each modality are transformed into words or visual words co-occurrence vector, then a novel agglomerative information maximization is presented to build the hybrid word space for all modalities, which ensures the statistical correlation between the low-level features of multiple modalities. 2) Clustering ensemble (CE) model. This model adopts the mutual information to measure the similarity between the clustering partitions of different modalities, which ensures the semantic correlation of the high-level clustering partitions. Secondly, SPIM algorithm integrates the shared information of multiple modalities and the private information of individual modalities into a unified objective function. Finally, the optimization of SPIM algorithm is performed by a sequential “draw-and-merge” procedure, which guarantees the function converge to a local maximum. The experimental results on 6 cross media datasets show that the proposed approach compares favorably with the existing state-of-the-art cross-media clustering methods.

Key words: cross-media, multi-source heterogeneous, share and private information, information maximization, mutual information, clustering analyse

CLC Number: