• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Liu Yue, Li Jintao, Hu Songlin. A Cost-Based Splitting Policy Search Algorithm for Hive Multi-Dimensional Index[J]. Journal of Computer Research and Development, 2016, 53(4): 798-810. DOI: 10.7544/issn1000-1239.2016.20151163
Citation: Liu Yue, Li Jintao, Hu Songlin. A Cost-Based Splitting Policy Search Algorithm for Hive Multi-Dimensional Index[J]. Journal of Computer Research and Development, 2016, 53(4): 798-810. DOI: 10.7544/issn1000-1239.2016.20151163

A Cost-Based Splitting Policy Search Algorithm for Hive Multi-Dimensional Index

More Information
  • Published Date: March 31, 2016
  • In the domain of energy Internet, smart city, etc, the massive smart devices collect large amount of data every day, and traditional enterprises need to perform lots of multi-dimensional analysis on these data to support decision-making. Recently, these enterprises try to solve the big data problem with technologies from Internet companies, for example, Hadoop and Hive etc. However, Hive has limited multi-dimensional index ability, and cannot satisfy the requirements of high-performance analysis in traditional enterprises. In this paper, we propose a distributed grid file based multi-dimensional index—DGFIndex to improve the multi-dimensional query performance of Hive. However, DGFIndex needs user to specify the splitting policy when creating index, which is not trivial for user when they are not familiar with data and query pattern. To solve it, we propose a novel MapReduce cost model to measure the DGFIndex-based query performance on specific splitting policy, a two-phase simulated annealing algorithm to search for the suitable splitting policy for DGFIndex, and finally decrease the total cost time of query set. The experimental results show that, DGFIndex improves 50%~114% query performance than original Compact Index in Hive. For static query set, compared with manual-specifying partition policy, our algorithm can choose suitable interval size for each index dimension, and decrease the cost time of query set at most 30%.

Catalog

    Article views (1431) PDF downloads (554) Cited by()
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return