ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (7): 1567-1577.doi: 10.7544/issn1000-1239.2019.20180792

Previous Articles     Next Articles

A Massively Parallel Bayesian Approach to Factorization-Based Analysis of Big Time Series Data

Gao Tengfei, Liu Yongyan, Tang Yunbo, Zhang Lei, Chen Dan   

  1. (School of Computer Science, Wuhan University, Wuhan 430072)
  • Online:2019-07-01

Abstract: Big time series data record the evolvement of a complex system(s) in large temporal and spatial scales with great details of the interactions amongst different parts of the system. Extracting the latent low-dimensional factors plays a crucial role in examining the overall mechanism of the underlying complex system(s). Research challenges arise with the lack of a priori knowledge, and most conventional factorization methods are not able to adapt to the ultra-high dimension and scales of the big data. Aiming at the grand challenge, this study develops a massively parallel Bayesian approach (G-BF) to factorization-based analysis of tensors formed by massive time series. The approach relies on a Bayesian algorithm to derive the factor matrices in the absence of a priori information. Then the algorithm has been mapped to the compute unified device architecture (CUDA) model to update the factor matrices in a massively parallel manner. The proposed approach is designed to support factorization of tensors of arbitrary dimensions. Experimental results indicated that 1) In comparison with GPU-hierarchical alternative least square (G-HALS), G-BF exhibits much better runtime performance and the superiority becomes more obvious with the increasing data scale; 2)G-BF has excellent scalability in terms of both data volume and rank; 3)Applying G-BF to the existing framework for fusing sub-factors (hierarchical-parallel factor analysis,H-PARAFAC), it becomes possible to factorize a huge tensor (volume up to 10\+{11} over two nodes) as a whole with the capability two magnitudes higher than conventional methods.

Key words: Bayesian model, big time series data, tensor factorization, massively parallel computing, compute unified device architecture (CUDA)

CLC Number: