高级检索

    高维数据流的在线相关性分析

    Online Correlation Analysis for Multiple Dimensions Data Streams

    • 摘要: 为了解决在资源受限的计算环境下快速检测高维数据流之间相关性的问题,提出一种新颖的在线典型相关性分析(CCA)算法QuickCCA, 针对传统CCA计算中的性能瓶颈, 首先采用不等概列采样技术约减流元组的数量,形成概要矩阵; 然后在概要矩阵的基础上增量地计算多维数据流之间的前k个典型相关系数.经理论分析和实验证明,QuickCCA能够在线精确地识别同步滑动窗口模式下多维数据流之间的相关性.与已有分析多数据流相关性的算法相比,QuickCCA显著地降低了计算复杂度,并且能够在精度和性能之间折中,可以作为通用的分析工具广泛应用于数据流挖掘领域.

       

      Abstract: Studied in this paper is the problem of identifying correlations between two multiple-dimensions data streams under constrained resources. A novel online canonical correlation analysis (CCA) algorithm based on approximate technique, called QuickCCA, is proposed. To solve bottleneck of CCA's performance, QuickCCA uses a column-sampling with non-equal probability to compress the numbers of tuples and construct synopsis matrix first. And based on the synopsis matrix, the most k principal correlation coefficients between evolving multiple-dimensions data streams are computed rapidly. Theoretic analysis and experiments indicate that QuickCCA can accurately identify correlations between two multiple-dimensions data streams in synchronic sliding windows model. Compared with the existing correlation analysis algorithm for data streams, the QuickCCA algorithm reduces complexity of computation efficiently and trades accuracy with performance. It can be presented as a generic tool for a multitude of applications on data stream mining problems.

       

    /

    返回文章
    返回