• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

面向时间序列大数据海量并行贝叶斯因子化分析方法

高腾飞, 刘勇琰, 汤云波, 张垒, 陈丹

高腾飞, 刘勇琰, 汤云波, 张垒, 陈丹. 面向时间序列大数据海量并行贝叶斯因子化分析方法[J]. 计算机研究与发展, 2019, 56(7): 1567-1577. DOI: 10.7544/issn1000-1239.2019.20180792
引用本文: 高腾飞, 刘勇琰, 汤云波, 张垒, 陈丹. 面向时间序列大数据海量并行贝叶斯因子化分析方法[J]. 计算机研究与发展, 2019, 56(7): 1567-1577. DOI: 10.7544/issn1000-1239.2019.20180792
Gao Tengfei, Liu Yongyan, Tang Yunbo, Zhang Lei, Chen Dan. A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data[J]. Journal of Computer Research and Development, 2019, 56(7): 1567-1577. DOI: 10.7544/issn1000-1239.2019.20180792
Citation: Gao Tengfei, Liu Yongyan, Tang Yunbo, Zhang Lei, Chen Dan. A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data[J]. Journal of Computer Research and Development, 2019, 56(7): 1567-1577. DOI: 10.7544/issn1000-1239.2019.20180792
高腾飞, 刘勇琰, 汤云波, 张垒, 陈丹. 面向时间序列大数据海量并行贝叶斯因子化分析方法[J]. 计算机研究与发展, 2019, 56(7): 1567-1577. CSTR: 32373.14.issn1000-1239.2019.20180792
引用本文: 高腾飞, 刘勇琰, 汤云波, 张垒, 陈丹. 面向时间序列大数据海量并行贝叶斯因子化分析方法[J]. 计算机研究与发展, 2019, 56(7): 1567-1577. CSTR: 32373.14.issn1000-1239.2019.20180792
Gao Tengfei, Liu Yongyan, Tang Yunbo, Zhang Lei, Chen Dan. A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data[J]. Journal of Computer Research and Development, 2019, 56(7): 1567-1577. CSTR: 32373.14.issn1000-1239.2019.20180792
Citation: Gao Tengfei, Liu Yongyan, Tang Yunbo, Zhang Lei, Chen Dan. A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data[J]. Journal of Computer Research and Development, 2019, 56(7): 1567-1577. CSTR: 32373.14.issn1000-1239.2019.20180792

面向时间序列大数据海量并行贝叶斯因子化分析方法

基金项目: 国家自然科学基金项目(61772380);湖北省自然科学基金创新群体项目(2017CFA007)
详细信息
  • 中图分类号: TP301

A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data

  • 摘要: 时间序列大数据记录着复杂系统在时间和空间上大尺度的演化过程,详细描述了系统不同部分之间的相互作用和相互联系.提取时间序列大数据中潜在的低维因子对研究复杂系统的整体机制有着至关重要的作用.大数据的超高维和大尺度导致许多传统因子分析方法难以适应,先验知识缺乏更增加了研究难度.针对这一巨大挑战,提出了一种面向时间序列大数据的海量并行贝叶斯因子化分析方法(the massively parallel Bayesian factorization approach, G-BF).在缺失先验知识的情况下,通过贝叶斯算法导出因子矩阵,将算法映射至CUDA(compute unified device architecture)模型,以大规模并行的方式更新因子矩阵.该方法支持对任意维度张量的因子分解.实验结果表明:1)与通过GPU加速化的因子分解算法G-HALS(GPU-hierarchical alternative least square)相比,G-BF具有更好的运行性能,且随着数据规模的增加,其性能优越性更加明显;2)G-BF在数据处理规模、秩及维度方面都具有良好的可扩展性;3)将G-BF应用于现有子因子融合框架(hierarchical-parallel factor analysis, H-PARAFAC),可将“巨型”张量作为一个整体进行因子化分解(在2个节点上处理10\+{11}个数据元素),其能力较常规方法高出2个数量级.
    Abstract: Big time series data record the evolvement of a complex system(s) in large temporal and spatial scales with great details of the interactions amongst different parts of the system. Extracting the latent low-dimensional factors plays a crucial role in examining the overall mechanism of the underlying complex system(s). Research challenges arise with the lack of a priori knowledge, and most conventional factorization methods are not able to adapt to the ultra-high dimension and scales of the big data. Aiming at the grand challenge, this study develops a massively parallel Bayesian approach (G-BF) to factorization-based analysis of tensors formed by massive time series. The approach relies on a Bayesian algorithm to derive the factor matrices in the absence of a priori information. Then the algorithm has been mapped to the compute unified device architecture (CUDA) model to update the factor matrices in a massively parallel manner. The proposed approach is designed to support factorization of tensors of arbitrary dimensions. Experimental results indicated that 1) In comparison with GPU-hierarchical alternative least square (G-HALS), G-BF exhibits much better runtime performance and the superiority becomes more obvious with the increasing data scale; 2)G-BF has excellent scalability in terms of both data volume and rank; 3)Applying G-BF to the existing framework for fusing sub-factors (hierarchical-parallel factor analysis,H-PARAFAC), it becomes possible to factorize a huge tensor (volume up to 10\+{11} over two nodes) as a whole with the capability two magnitudes higher than conventional methods.
  • 期刊类型引用(8)

    1. 黄伟,王小波,乔蓓蓓. 基于混合云架构的电厂运行数据并行迁移系统. 电子设计工程. 2023(02): 126-129+134 . 百度学术
    2. 沈芙辉,苏欣. 基于对比阈值的大数据流特征量最优挖掘算法. 计算机仿真. 2023(11): 319-323 . 百度学术
    3. 李霞. 基于连续密度隐马尔可夫的时间序列分类算法. 计算机仿真. 2021(01): 291-294 . 百度学术
    4. 于晓翠,陈亮,林泽源. 基于人工智能的大数据信息快速抽取算法研究. 电子设计工程. 2021(05): 149-153 . 百度学术
    5. 陈艺,江芝蒙,张渝. 云系统中基于同态哈希认证的大数据安全传输. 计算机工程与设计. 2021(05): 1250-1256 . 百度学术
    6. 杨海明. 面向大数据处理的并行计算模型及性能优化. 农村经济与科技. 2020(10): 331-332 . 百度学术
    7. 郭大亮. 发电设备监测大数据存储优化与并行研究. 自动化与仪器仪表. 2020(10): 184-186+190 . 百度学术
    8. 刘兴建,原振文. 融合贝叶斯深度学习的计算机大数据频繁项挖掘算法. 成都工业学院学报. 2020(04): 38-42+62 . 百度学术

    其他类型引用(5)

计量
  • 文章访问数:  1231
  • HTML全文浏览量:  3
  • PDF下载量:  391
  • 被引次数: 13
出版历程
  • 发布日期:  2019-06-30

目录

    /

    返回文章
    返回