Abstract:
In the recent literature, a variety of index structures have been proposed to facilitate high-dimensional KNN queries, among which the techniques of approximate vector presentation and one dimensional transformation can efficiently break the curse of dimensionality. Based on the two techniques above, a novel high-dimensional index, called bit-code and distance based index (BD) is proposed. On the basis of the data distribution in high-dimensional spaces, the BD partitions the surface of dimensionality in a special way, such that all the data points are divided into lots of partitions according to some cluster centroids. By the definitions of bit code and transformation function, a high-dimensional vector can be first approximately represented and then transformed into a one-dimensional vector, the key managed by a special B\++-tree. During the process of KNN search, the BD enables two levels filtering: the transformation function prunes away points that do not share similar distance ranges, while the bit code component filters away points by the lower bounded distance. A fast algorithm is presented for KNN search, by which one can greatly reduce the number of distance computations and the cost of the tree search. By using both synthetic and real data, the results of extensive experiments demonstrate that the BD outperforms the existing index structures for KNN search in high-dimensional spaces.