ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (2): 333-342.doi: 10.7544/issn1000-1239.2015.20140302

Special Issue: 2015大数据管理

Previous Articles     Next Articles

Big Data Analysis and Data Velocity

Chen Shimin   

  1. (State Key Laboratory of Computer Architecture(Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190)
  • Online:2015-02-01

Abstract: Big data poses three main challenges to the underlying data management systems: volume (a huge amount of data), velocity (high speed of data generation, data acquisition, and data updates), and variety (a large number of data types and data formats). In this paper, we focus on understanding the significance of velocity and discussing how to face the challenge of velocity in the context of big data analysis systems. We compare the requirements of velocity in transaction processing, data stream, and data analysis systems. Then we describe two of our recent research studies with an emphasis on the role of data velocity in big data analysis systems: 1) MaSM, supporting online data updates in data warehouse systems; 2) LogKV, supporting high-throughput data ingestion and efficient time-window based joins in an event log processing system. Comparing the two studies, we find that storing incoming data updates is only the minimum requirement. We should consider velocity as an integral part of the data acquisition and analysis life cycle. It is important to analyze the characteristics of the desired big data analysis operations, and then to optimize data organization and data distribution schemes for incoming data updates so as to maintain or even improve the efficiency of big data analysis.

Key words: data updates, big data analysis, data warehouse, event log processing system, data organization and distribution

CLC Number: