Advanced Search
    An Mingyuan, Sun Xiuming, Sun Ninghui. Dynamic Data-Partitioned Online Aggregation[J]. Journal of Computer Research and Development, 2010, 47(11): 1928-1935.
    Citation: An Mingyuan, Sun Xiuming, Sun Ninghui. Dynamic Data-Partitioned Online Aggregation[J]. Journal of Computer Research and Development, 2010, 47(11): 1928-1935.

    Dynamic Data-Partitioned Online Aggregation

    • To avoid the performance degradation due to random I/O, traditional online aggregation algorithms assume that the source data are already randomized in the data file, so sequential access approximately equals to random sampling over the data. But this assumption doesn’ hold in many real scenes which leads to obvious error when running the algorithms. The authors propose a new method: dynamic data-partitioned online aggregation (DDPOA). DDPOA logically splits the data into non-conjunctive partitions, each of which consists of consecutive data items in the data file, computes estimates based on individual partition, and then uses specific linear combination of these values to estimate the final result. DDPOA weakens the randomization requirement over the whole dataset and makes the estimates more accurate. Accessing partitioned data could cause lower performance due to random disk I/O. To handle I/O performance issue, DDPOA dynamically adjusts the partitions during execution. Adjacent partitions that are similar enough will be judged and merged into one which improves the I/O performance without losing the accuracy. Experiment on real dataset from network security monitor system DBroker shows that DDPOA is much better than traditional algorithms in terms of accuracy with little performance overhead. When it comes to the dataset satisfying the randomization assumption, DDPOA is as good as the traditional algorithms.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return