Online Correlation Analysis for Multiple Dimensions Data Streams
-
-
Abstract
Studied in this paper is the problem of identifying correlations between two multiple-dimensions data streams under constrained resources. A novel online canonical correlation analysis (CCA) algorithm based on approximate technique, called QuickCCA, is proposed. To solve bottleneck of CCA's performance, QuickCCA uses a column-sampling with non-equal probability to compress the numbers of tuples and construct synopsis matrix first. And based on the synopsis matrix, the most k principal correlation coefficients between evolving multiple-dimensions data streams are computed rapidly. Theoretic analysis and experiments indicate that QuickCCA can accurately identify correlations between two multiple-dimensions data streams in synchronic sliding windows model. Compared with the existing correlation analysis algorithm for data streams, the QuickCCA algorithm reduces complexity of computation efficiently and trades accuracy with performance. It can be presented as a generic tool for a multitude of applications on data stream mining problems.
-
-