高级检索

    VPM:列存储系统中基于带值路径的物化技术

    VPM: Materialization Based on Path with Values in Column-Stores

    • 摘要: 物化是列存储数据仓库查询中必不可少的操作,物化策略和物化技术直接影响到查询执行的性能,因此设计一种适应于列存储系统的物化策略和相关技术尤为重要.针对延迟物化可能重复读取数据块的缺陷,提出了基于带值路径的物化技术,简称VPM.首先,定义了一个描述物理执行中间结果的结构——传递块,该结构将用于重构的位置信息与实际列值相分离.在此基础上,对于给定的物理查询树,根据其操作节点是否需要某一列的值进行路径标记,生成自扫描节点或抽值节点到最终需要这些节点的引用列的祖先节点之间的路径,即带值路径.将起始节点引用列的列值保存在传递块的列值区中,并在向查询树的上层操作节点传输过程中不断对其过滤.对带值路径中的其他列仅保存其位置信息.在查询执行时,除了路径起始节点要从磁盘读取数据外,其他节点直接从传递块中获得相应的列值,有效地减少了查询处理过程的I/O开销,提高了查询的执行性能.最后在DWMS上使用TPC-H中针对数据仓库的基准数据集SSBM进行实验,验证了基于带值路径物化技术的有效性.

       

      Abstract: Materialization is one of the key issues for query execution in column-stores due to the fact that it has direct influence on query performence. It is important to design a set of materialization strategies and relative technologies to column stores. Existing late materialization may re-read the same data blocks. This paper proposes a materializing technology based on path with values (VPM). Firstly, a new descriptor structure, called passing block, is defined for the intermediate results during physical execution, in which the position information of values is stored separately from the values. Based on this, for a given physical query tree, all efficient paths with values from the scanned nodes or extracted nodes to the ancestor nodes are generated according to whether the ancestors need the values. In the light of the path with values, the values of the column are saved in the value area of the passing block if they are needed by the ancestor nodes, otherwise, only the position list is saved. During the query execution, the physical operations access directly data from passing block, which effectively reduces the unnecessary I/O cost. Consequently, VPM improves the performance of query execution in column stores. Experimental results on benchmark data set SSB show the effectiveness of the proposed method.

       

    /

    返回文章
    返回