高级检索

    一种基于主存Δ-tree的高维数据自相似连接处理

    A Δ-Tree Based Similarity Join Processing for High-Dimensional Data

    • 摘要: 相似连接作为数据挖掘的基元,可被用来大幅度提高相似搜索、数据分析和数据挖掘的速度.大多数研究主要集中在大量基于磁盘数据的高维连接.目前计算机可得到的主存容量越来越大以及对空间连接的有效处理的需求表明,一大类问题的空间连接能够在主存中处理.Δ-tree是一个新提出的多层索引,已被证实优于其他主存索引.因此,以Δ-tree为基础,提出了一种空间连接算法Δ-tree-join,研究了它的性能,和目前最先进的算法EGO-join和EGO\+*-join进行了比较.结果显示Δ-tree-join的效率比它们有大幅度提高,是一种有效的连接方法.

       

      Abstract: The similarity join, an important data mining primitive, can be successfully applied to speeding up applications such as similarity search, data analysis and data mining. So far most of researches focus on the execution of high-dimensional joins over large amounts of disk based data. The increasing sizes of main memory available on current computers, and the need for efficient processing of spatial joins suggest that spatial joins for a large class of problems can be processed in main memory. Δ-tree is a novel multi-level index structure, it can speed up the high-dimensional query in main memory environment and has been proven to be an efficient index method. Each level in the Δ-tree represents the data space at different dimensionalities: the number of dimensions increases towards the leaf level which contains the data at their full dimensions. The remaining dimensions are obtained using principal component analysis. Using the properties of Δ-tree, a similarity join algorithm on the basis of index structure Δ-tree, Δ-tree-join, is presented. The top-down scheme can use fewer number of dimensions, compute the distances and efficiently complete join processing. Experimental results indicate that Δ-tree-join outperforms the state-of-the-art algorithm, EGO-join, and EGO\+*-join by a wide margin, and is an efficient similarity join method.

       

    /

    返回文章
    返回