PAA: An Efficient Approximate Aggregation Algorithm on Massive Data

Han Xixian, Li Jianzhong, and Gao Hong

Han Xixian, Li Jianzhong, and Gao Hong. PAA: An Efficient Approximate Aggregation Algorithm on Massive Data[J]. Journal of Computer Research and Development, 2014, 51(1): 41-53.

Citation:

Han Xixian, Li Jianzhong, and Gao Hong. PAA: An Efficient Approximate Aggregation Algorithm on Massive Data[J]. Journal of Computer Research and Development, 2014, 51(1): 41-53.

Citation:

Han Xixian, Li Jianzhong, and Gao Hong. PAA: An Efficient Approximate Aggregation Algorithm on Massive Data[J]. Journal of Computer Research and Development, 2014, 51(1): 41-53.

PAA: An Efficient Approximate Aggregation Algorithm on Massive Data

Han Xixian, Li Jianzhong, and Gao Hong

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Aggregation is a commonly used but time-consuming operation in database systems. Relative Compared to exact query, it is often more attractive to return an approximate result with the required error bound to user in a much faster response time. However, we find that none of the previous methods can process approximate aggregation on massive data with arbitrary accuracy and high efficiency. A novel algorithm PAA is proposed to efficiently process approximate aggregation with an arbitrary confidence interval. The data space of dimensional attributes is divided into multiple hypercubes of the same cube size. Each partition maintains the tuples whose dimensional attributes fall into the corresponding hypercube. A random sample RS is pre-constructed on table. PAA consists of two stages. If the approximate result obtained by RS in stage 1 does not satisfy the confidence interval, it is required to retrieve more random tuples from partition set IPS whose hypercubes overlap with search region in stage 2. The novelty of PAA lies in how to retrieve random tuples from IPS when the exact number of tuples satisfying predicate in each partition is unknown and how to reduce random I/O cost of retrieval operation as much as possible. The experimental results show that PAA obtains up to two orders of magnitude speedup compared with the existing methods.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

PAA: An Efficient Approximate Aggregation Algorithm on Massive Data

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content