ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (5): 1061-1070.doi: 10.7544/issn1000-1239.2015.20140693

Previous Articles     Next Articles

Materialization Strategies in Big Data Analysis System Based on Column-Store

Zhang Bin1,2, Le Jiajin1, Sun Li1, Xia Xiaoling1, Wang Mei1, Li Yefeng1   

  1. 1(College of Computer Science and Technology, Donghua University, Shanghai 201620); 2(Zhejiang University of Finance & Economics, Hangzhou 310018)
  • Online:2015-05-01

Abstract: The characters of big data are volume, variety, velocity, common hardware and open source. In traditional relational database, materialization can speed up query processing greatly. However, modern big data analysis faces a confluence of growing challenges that systems become more and more inefficiently and scalability. Consequently, this paper presents some materialization strategies based on column-store to provide an effective environment for big data analysis. Firstly, it analyzes the impact of materialization efficiency by MapReduce cost model. Secondly, it designs the MapReduce column-store File, and achieves optimization by cooperative localization strategy. Fourthly, according to the different materialization time window, it proposes materialization strategies in MapReduce based on column-store (MSMC), which is composed of three strategies: MapReduce early materialization strategy (MEMS), MapReduce late materialization strategy (MLMS) and MapReduce early-late materialization strategy (MELMS). Thirdly, for the sake of avoiding malignant expansion of materialization sets, it designs the adaptive materialization sets adjust strategy(AMSAS), which realizes the optimization of MSMC effectively. Finally, the experiments are conducted to evaluate execution time and load capacity. The results reveal that the materialization strategies in MapReduce based on column-store and adaptive materialized set adjustment strategy can effectively reduce the intermediate data process of MapReduce, network bandwidth and unnecessary I/O. It verifies the effectiveness of the proposed method in big data analysis.

Key words: big data, column-store, materialization strategy(MS), MapReduce, analysis system

CLC Number: