Abstract:
The characters of big data are volume, variety, velocity, common hardware and open source. In traditional relational database, materialization can speed up query processing greatly. However, modern big data analysis faces a confluence of growing challenges that systems become more and more inefficiently and scalability. Consequently, this paper presents some materialization strategies based on column-store to provide an effective environment for big data analysis. Firstly, it analyzes the impact of materialization efficiency by MapReduce cost model. Secondly, it designs the MapReduce column-store File, and achieves optimization by cooperative localization strategy. Fourthly, according to the different materialization time window, it proposes materialization strategies in MapReduce based on column-store (MSMC), which is composed of three strategies: MapReduce early materialization strategy (MEMS), MapReduce late materialization strategy (MLMS) and MapReduce early-late materialization strategy (MELMS). Thirdly, for the sake of avoiding malignant expansion of materialization sets, it designs the adaptive materialization sets adjust strategy(AMSAS), which realizes the optimization of MSMC effectively. Finally, the experiments are conducted to evaluate execution time and load capacity. The results reveal that the materialization strategies in MapReduce based on column-store and adaptive materialized set adjustment strategy can effectively reduce the intermediate data process of MapReduce, network bandwidth and unnecessary I/O. It verifies the effectiveness of the proposed method in big data analysis.