Abstract:
Since 1990's, great progress has been made in the area of content-based multimedia information retrieval. A very challenging problem emerged at the same time: how to organize high dimensional vectors so that efficient similarity query could be realized. Many index structures have been proposed to solve this problem, such as R-Tree and its variants, VA-File, A-Tree etc. From the published results, it can be concluded that most of these methods could achieve good query performance when the dimensionality is less than 20. However, the performance suffers greatly as the dimensionality increases. To obtain efficient similarity query in higher dimensional spaces, a new index structure called VA-Trie is introduced. The key idea behind VA-Trie is adopting the idea of quantization to compress the vectors and then employing the Trie structure to organize and manage the approximations. The experimental results show that VA-Trie outperforms A-Tree and sequential scan in high dimensional spaces.