Abstract:
There are massive matrix and iterative calculations in fast independent component analysis (FastICA) for hyperspectral image dimensionality reduction. By analyzing hotspots of FastICA algorithm, we design the parallel schemes of covariance matrix calculating, whitening processing and ICA iteration on many integrated core (MIC), implement and validate an M-FastICA algorithm. Further, we present a performance model for M-FastICA. We present a series of optimization methods for the parallel schemes of different hotspots: reforming the arithmetic operations, interchanging and unrolling loops, transposing matrix, using intrinsics and so on. In particular, we propose a novel method to balance the loads when dealing with the lower triangular matrix. Then we measure the performance effects of such optimization methods. Our experiments show that the M-FastICA algorithm can reach a maximum speed-up of 42X times in our test, and it runs 2.2X times faster than the CPU parallel version on 24 cores. We also investigate how the speed-ups change with the bands. The experiment results validate our performance model with an acceptable accuracy and thus can provide a roofline for our optimization effort.