ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (7): 1463-1476.doi: 10.7544/issn1000-1239.2015.20140236

• 人工智能 •    下一篇

混合概率典型相关性分析

张博1,2,3,郝杰4 , 马刚1,2,岳金朋1,2,张建华1,2,史忠植1   

  1. 1(中国科学院计算技术研究所智能信息处理重点实验室 北京 100190); 2(中国科学院大学 北京 100049); 3(中国矿业大学计算机科学与技术学院 江苏徐州 221116); 4(徐州医学院医学信息学院 江苏徐州 221004) (zhangb@ics.ict.ac.cn)
  • 出版日期: 2015-07-01
  • 基金资助: 
    基金项目:国家“九七三”重点基础研究发展计划基金项目(2013CB329502);国家自然科学基金项目(61035003,61202212,61072085,60933004,61379101);国家“八六三”高技术研究发展计划基金项目(2012AA011003);国家科技支撑计划基金项目(2012BA107B02)

Mixture of Probabilistic Canonical Correlation Analysis

Zhang Bo1,2,3, Hao Jie4, Ma Gang1,2, Yue Jinpeng1,2, Zhang Jianhua1,2, Shi Zhongzhi1   

  1. 1(Key Laboratory of Intelligent Information Processing, Institute of Computing Technology, Chinese Academy of Sciences, Beijing 100190);2(University of Chinese Academy of Sciences, Beijing 100049);3(School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, Jiangsu 221116) ;4(School of Medicine Information, Xuzhou Medical College, Xuzhou, Jiangsu 221004)
  • Online: 2015-07-01

摘要: 典型相关性分析(canonical correlation analysis, CCA)是一种用来分析2组随机变量之间相关性的统计分析工具,但作为一种线性数学模型,CCA不足以揭示真实世界中大量存在的非线性相关现象.采用局部化的方法,在概率典型相关性分析(probabilistic CCA, PCCA)的基础上,使用概率混合模型框架,提出了混合概率典型相关性分析模型(mixture of probabilistic CCA, MixPCCA)以及估计模型参数的2阶段期望最大化(expectation maximization, EM)算法,并给出了使用聚类融合确定局部线性模型数量的方法和MixPCCA模型应用于模式识别的理论框架.在手写体数据集USPS和MNIST上的实验证明,MixPCCA模型通过混合多个局部线性PCCA模型不仅提供了一种捕捉复杂的全局非线性相关性的解决方案,而且还具备检测只在局部区域才存在的相关性的能力.

关键词: 典型相关性分析, 概率典型相关性分析, 混合概率模型, 聚类融合, 模式识别

Abstract: Canonical correlation analysis (CCA) is a statistical analysis tool, which is used to analyze the correlation between two sets of random variables. A critical limitation of CCA is that it can only detect linear correlation between the two domains that is globally valid throughout both data sets. It is not enough to reveal the large amount of non-linear correlation phenomenon in the real world. To address this limitation, there are three main ways: kernel mapping, neural network and the method of localization. In this paper, a mixture model of local linear probabilistic canonical correlation analysis (PCCA) called MixPCCA is constructed based on the idea of localization, and a two-stage EM algorithm is proposed to estimate the model parameters. How to determine the number of local linear models is a fundamental issue to be addressed. We solve this problem by the framework of cluster ensembles. In addition, the theoretical framework of MixPCCA model applied in pattern recognition is put forward. The results on both USPS and MNIST handwritten image datasets demonstrate that the proposed MixPCCA model not only provides a solution to capture the complex global non-linear correlation, but also has the ability of detecting correlation which only exist in the local area, which traditional CCA or PCCA fails to discover.

Key words: canonical correlation analysis, probabilistic canonical correlation analysis, mixture probabilistic model, cluster ensembles, pattern recognition

中图分类号: