一种基于主存Δ-tree的高维数据KNN连接算法

刘  艳; 郝忠孝

一种基于主存Δ-tree的高维数据KNN连接算法

刘艳,
郝忠孝

A KNN-Join Algorithm Based on Δ-Tree for High-Dimensional Data

摘要

摘要: KNN连接作为数据挖掘的基元，可以用来大幅度提高相似搜索、数据分析和数据挖掘的速度.到目前为止，对KNN连接的研究主要在基于磁盘系统的背景下进行，假设数据库太大以至于不能装入主存.随着RAM越来越大，价格也越来越低廉，这种假设逐渐受到挑战.因此，有必要重新对基于主存的KNN连接进行研究.在高效主存索引的基础上，采用编码解码、自底向上、深度优先遍历和剪枝等技术提出了一种新的KNN连接算法Δ-tree-KNN-Join.该算法解决了KNN连接中确定搜索半径困难的问题，提高了连接效率.在真实数据和合成聚类数据上进行了实验，结果显示Δ-tree-KNN-Join是一种有效的主存KNN连接算法.

Abstract: KNN-Join is an important database primitive, which has been successfully applied to speed up applications such as similarity search, data analysis and data mining.Up to now, the KNN-Join has been studied in the context of disk-based systems where it is assumed that the databases are too large to fit into the main memory.This assumption is increasingly being challenged as RAM gets cheaper and larger.Therefore, it is necessary to study the problem of KNN-Join in the main memory. Δ-tree is a novel multi-level index structure that can speed up the high-dimensional query in the main memory environment. In this paper, a new KNN-Join approach is proposed, which uses Δ-tree as the underlying index structure, exploits coding and decoding, bottom up, depth first search and pruning technique.The problem that it is hard to identify the distance from each point in R to its K nearest neighbors in S is solved and the Join efficiency is improved.The correctness verification and cost analysis of the above mentioned algorithm are presented.Extensive experiments on both real datasets and synthetic clustered datasets are conducted, and the results illustrate that Δ-tree-KNN-Join is an efficient KNN-Join method in main memory.

HTML全文

参考文献(0)

施引文献

资源附件(0)