高级检索
    方民权, 张卫民, 周海芳. 集成众核上快速独立成分分析降维并行算法[J]. 计算机研究与发展, 2016, 53(5): 1136-1146. DOI: 10.7544/issn1000-1239.2016.20148080
    引用本文: 方民权, 张卫民, 周海芳. 集成众核上快速独立成分分析降维并行算法[J]. 计算机研究与发展, 2016, 53(5): 1136-1146. DOI: 10.7544/issn1000-1239.2016.20148080
    Fang Minquan, Zhang Weimin, Zhou Haifang. Parallel Algorithm of Fast Independent Component Analysis for Dimensionality Reduction on Many Integrated Core[J]. Journal of Computer Research and Development, 2016, 53(5): 1136-1146. DOI: 10.7544/issn1000-1239.2016.20148080
    Citation: Fang Minquan, Zhang Weimin, Zhou Haifang. Parallel Algorithm of Fast Independent Component Analysis for Dimensionality Reduction on Many Integrated Core[J]. Journal of Computer Research and Development, 2016, 53(5): 1136-1146. DOI: 10.7544/issn1000-1239.2016.20148080

    集成众核上快速独立成分分析降维并行算法

    Parallel Algorithm of Fast Independent Component Analysis for Dimensionality Reduction on Many Integrated Core

    • 摘要: 高光谱遥感影像快速独立成分分析(fast independent component analysis, FastICA)降维过程包含大规模矩阵计算及大量迭代计算.通过热点分析,面向集成众核(many integrated core, MIC)架构设计了协方差矩阵计算、白化处理和ICA迭代等热点并行方案,提出和实现一种M-FastICA并行降维算法,并构建算法性能模型;基于集成众核研究并行程序优化策略,针对各热点并行方案提出一系列优化策略,特别是创新性地提出一种下三角阵负载均衡方法,并量化测试其优化效果.实验结果显示M-FastICA算法最高可加速42倍,比24核CPU多线程并行快2.2倍;探讨了波段数与并行程序性能的关系;实验数据验证了算法性能模型的准确性.

       

      Abstract: There are massive matrix and iterative calculations in fast independent component analysis (FastICA) for hyperspectral image dimensionality reduction. By analyzing hotspots of FastICA algorithm, we design the parallel schemes of covariance matrix calculating, whitening processing and ICA iteration on many integrated core (MIC), implement and validate an M-FastICA algorithm. Further, we present a performance model for M-FastICA. We present a series of optimization methods for the parallel schemes of different hotspots: reforming the arithmetic operations, interchanging and unrolling loops, transposing matrix, using intrinsics and so on. In particular, we propose a novel method to balance the loads when dealing with the lower triangular matrix. Then we measure the performance effects of such optimization methods. Our experiments show that the M-FastICA algorithm can reach a maximum speed-up of 42X times in our test, and it runs 2.2X times faster than the CPU parallel version on 24 cores. We also investigate how the speed-ups change with the bands. The experiment results validate our performance model with an acceptable accuracy and thus can provide a roofline for our optimization effort.

       

    /

    返回文章
    返回