Abstract:
Materialization is one of the key issues for query execution in column-stores due to the fact that it has direct influence on query performence. It is important to design a set of materialization strategies and relative technologies to column stores. Existing late materialization may re-read the same data blocks. This paper proposes a materializing technology based on path with values (VPM). Firstly, a new descriptor structure, called passing block, is defined for the intermediate results during physical execution, in which the position information of values is stored separately from the values. Based on this, for a given physical query tree, all efficient paths with values from the scanned nodes or extracted nodes to the ancestor nodes are generated according to whether the ancestors need the values. In the light of the path with values, the values of the column are saved in the value area of the passing block if they are needed by the ancestor nodes, otherwise, only the position list is saved. During the query execution, the physical operations access directly data from passing block, which effectively reduces the unnecessary I/O cost. Consequently, VPM improves the performance of query execution in column stores. Experimental results on benchmark data set SSB show the effectiveness of the proposed method.