Mixture of Probabilistic Canonical Correlation Analysis
-
Graphical Abstract
-
Abstract
Canonical correlation analysis (CCA) is a statistical analysis tool, which is used to analyze the correlation between two sets of random variables. A critical limitation of CCA is that it can only detect linear correlation between the two domains that is globally valid throughout both data sets. It is not enough to reveal the large amount of non-linear correlation phenomenon in the real world. To address this limitation, there are three main ways: kernel mapping, neural network and the method of localization. In this paper, a mixture model of local linear probabilistic canonical correlation analysis (PCCA) called MixPCCA is constructed based on the idea of localization, and a two-stage EM algorithm is proposed to estimate the model parameters. How to determine the number of local linear models is a fundamental issue to be addressed. We solve this problem by the framework of cluster ensembles. In addition, the theoretical framework of MixPCCA model applied in pattern recognition is put forward. The results on both USPS and MNIST handwritten image datasets demonstrate that the proposed MixPCCA model not only provides a solution to capture the complex global non-linear correlation, but also has the ability of detecting correlation which only exist in the local area, which traditional CCA or PCCA fails to discover.
-
-