Chen Shimin. Big Data Analysis and Data Velocity[J]. Journal of Computer Research and Development, 2015, 52(2): 333-342. DOI: 10.7544/issn1000-1239.2015.20140302
Citation:
Chen Shimin. Big Data Analysis and Data Velocity[J]. Journal of Computer Research and Development, 2015, 52(2): 333-342. DOI: 10.7544/issn1000-1239.2015.20140302
Chen Shimin. Big Data Analysis and Data Velocity[J]. Journal of Computer Research and Development, 2015, 52(2): 333-342. DOI: 10.7544/issn1000-1239.2015.20140302
Citation:
Chen Shimin. Big Data Analysis and Data Velocity[J]. Journal of Computer Research and Development, 2015, 52(2): 333-342. DOI: 10.7544/issn1000-1239.2015.20140302
Big data poses three main challenges to the underlying data management systems: volume (a huge amount of data), velocity (high speed of data generation, data acquisition, and data updates), and variety (a large number of data types and data formats). In this paper, we focus on understanding the significance of velocity and discussing how to face the challenge of velocity in the context of big data analysis systems. We compare the requirements of velocity in transaction processing, data stream, and data analysis systems. Then we describe two of our recent research studies with an emphasis on the role of data velocity in big data analysis systems: 1) MaSM, supporting online data updates in data warehouse systems; 2) LogKV, supporting high-throughput data ingestion and efficient time-window based joins in an event log processing system. Comparing the two studies, we find that storing incoming data updates is only the minimum requirement. We should consider velocity as an integral part of the data acquisition and analysis life cycle. It is important to analyze the characteristics of the desired big data analysis operations, and then to optimize data organization and data distribution schemes for incoming data updates so as to maintain or even improve the efficiency of big data analysis.
Wang Kaifan, Xu Yinan, Yu Zihao, Tang Dan, Chen Guokai, Chen Xi, Gou Lingrui, Hu Xuan, Jin Yue, Li Qianruo, Li Xin, Lin Jiawei, Liu Tong, Liu Zhigang, Wang Huaqiang, Wang Huizhe, Zhang Chuanqi, Zhang Fawang, Zhang Linjuan, Zhang Zifei, Zhang Ziyue, Zhao Yangyang, Zhou Yaoyang, Zou Jiangrui, Cai Ye, Huan Dandan, Li Zusong, Zhao Jiye, He Wei, Sun Ninghui, Bao Yungang. XiangShan Open-Source High Performance RISC-V Processor Design and Implementation[J]. Journal of Computer Research and Development, 2023, 60(3): 476-493. DOI: 10.7544/issn1000-1239.202221036