高级检索

    利用分区和距离实现高维空间快速KNN查询

    Indexing Bit-Code and Distance for Fast KNN Search in High-Dimensional Spaces

    • 摘要: 在高维空间KNN查询算法中,近似向量和一维转换表示法能有效克服维数灾难,结合这两种思想,提出一种基于区位码和距离的索引结构(BD)以实现快速KNN查询.根据高维空间向量分布特点,合理分区使得大量分布在空间表面的点尽可能地划分到不同的分区中,提高检索剪枝效率.引入区位码概念和转换函数,将高维向量近似表示并转换为一维数值形式,组织成B\++树索引.利用快速KNN查询算法,实现两层过滤,缩小搜索范围,降低树搜索代价.采用模拟数据和真实数据,大量实验验证了BD比其他同类索引具有更高的检索效率.

       

      Abstract: In the recent literature, a variety of index structures have been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one dimensional transformation can efficiently break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index, called bit-code and distance based index (BD) is proposed. On the basis of the data distribution in high-dimensional spaces, the BD partitions the surface of dimensionality in a special way, such that all the data points are divided into lots of partitions according to some cluster centroids. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a one-dimensional vector, the key managed by a special B\++-tree. During the process of KNN search, the BD enables two levels filtering: the transformation function prunes away points that do not share similar distance ranges, while the bit code component filters away points by the lower bounded distance. A fast algorithm is presented for KNN search, by which one can greatly reduce the number of distance computations and the cost of the tree search. By using both synthetic and real data, the results of extensive experiments demonstrate that the BD outperforms the existing index structures for KNN search in high-dimensional spaces.

       

    /

    返回文章
    返回