• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhao Hui, Yang Shuqiang, Chen Zhikun, Yin Hong, and Jin Songchang. Optimization of Range Queries and Analysis for MapReduce Systems[J]. Journal of Computer Research and Development, 2014, 51(3): 606-617.
Citation: Zhao Hui, Yang Shuqiang, Chen Zhikun, Yin Hong, and Jin Songchang. Optimization of Range Queries and Analysis for MapReduce Systems[J]. Journal of Computer Research and Development, 2014, 51(3): 606-617.

Optimization of Range Queries and Analysis for MapReduce Systems

More Information
  • Published Date: March 14, 2014
  • Recently, MapReduce parallel computing paradigm has gained extensive attention from industry and academia. MapReduce works well in Google, Yahoo! and Facebook on massive data processing. However, MapReduce-based systems originally were used to manage massive un-structured and semi-structured data, such as inverted indexing, Web page ranking, log analyzing etc. They ignored the optimizing of structured data, such as the brute-force scanning, which is inefficient for some common workloads in structured data management, such as select, filter etc. For this problem, we introdue a global indexing technology, which has been widely used in database, aiming to optimizing queries and analysis in a range of the overall dataset. Global index will help reduce redundant map tasks, resulting in decreasing the cost of I/O and scheduling. Finally, we evaluate the effect of our framework by four data selection ratios which are 80%, 50%, 30% and 10% under different cluster sizes. We find that the response time has 5x improvement at most, I/O cost improves 10x at most and cost of scheduling improves 11x at most.
  • Related Articles

    [1]An Zhongqi, Zhang Yunyao, Xing Jing, Huo Zhigang. Optimization of the Key-Value Storage System Based on Fused User-Level I/O[J]. Journal of Computer Research and Development, 2020, 57(3): 649-659. DOI: 10.7544/issn1000-1239.2020.20180799
    [2]An Zhongqi, Du Hao, Li Qiang, Huo Zhigang, Ma Jie. Memcached Optimization on High Performance I/O Technology[J]. Journal of Computer Research and Development, 2018, 55(4): 864-874. DOI: 10.7544/issn1000-1239.2018.20160890
    [3]Yang Lipeng, Che Yonggang. HDF5 Based Parallel I/O Techniques for Multi-Zone Structured Grids CFD Applications[J]. Journal of Computer Research and Development, 2015, 52(4): 861-868. DOI: 10.7544/issn1000-1239.2015.20131920
    [4]Wang Zhan, Cao Zheng, Liu Xiaoli, Su Yong, Li Qiang, An Xuejun, Sun Ninghui. A Multi-Root I/O Resource Pooling Method Based on Single-Root I/O Virtualization[J]. Journal of Computer Research and Development, 2015, 52(1): 83-93. DOI: 10.7544/issn1000-1239.2015.20131182
    [5]Wang Jianzong, Chen Yanjun, Xie Changsheng. Research on I/O Resource Scheduling Algorithms for Utility Optimization Towards Cloud Storage[J]. Journal of Computer Research and Development, 2013, 50(8): 1657-1666.
    [6]Li Mingqiang and Shu Jiwu. A Survey of Studies on Self-Similarity in I/O Workloads[J]. Journal of Computer Research and Development, 2008, 45(6).
    [7]Chen Yongran, Qi Xingyun, and Dou Wenhua. A Performance Model of I/O-Intensive Parallel Applications[J]. Journal of Computer Research and Development, 2007, 44(4): 707-713.
    [8]Xia Nan, Zhang Yaoxue, Yang Shanlin, Wang Xiaohui. IOMan: An I/O Management Method Supporting Multi-OS Remote Boot and Running[J]. Journal of Computer Research and Development, 2007, 44(2): 317-325.
    [9]Tang Jianqi, Fang binxing, Hu Mingzeng, and Wang Wei. Research on I/O Optimizations in Out-of-Core Computation[J]. Journal of Computer Research and Development, 2005, 42(10): 1820-1825.
    [10]Cao Qiang and Xie Changsheng. Applying Aggregate I/O to Improve Performance of Network Storage[J]. Journal of Computer Research and Development, 2005, 42(4): 544-550.

Catalog

    Article views (857) PDF downloads (638) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return