高级检索

    VAR-Tree——一种新的高维数据索引结构

    VAR-Tree—A New High-Dimensional Data Index Structure

    • 摘要: 在多媒体信息检索和数据挖掘等应用领域,实现高维矢量的K近邻搜索是非常具有挑战性的研究课题,为此人们提出了很多种索引结构.然而,现有研究成果表明,随着矢量维数的增加,基于树状索引结构的查询性能急剧下降,例如在R-Tree,X-Tree和SS-Tree中都会出现“维数灾难”.为此,又引入近似压缩的思想,即通过压缩数据来减少查询过程中的磁盘读写代价,例如VA-File等,不过,VA-File没有对近似矢量数据做任何的排序或层次处理.提出了一种新的索引结构VAR-Tree,它将VA-File与R-Tree有机结合起来,用R-Tree管理和组织VA-File中的近似数据,并用已提出的R-Tree类相似查询算法实现基于VAR-Tree的查询.实验结果表明,VAR-Tree较好地提高了检索性能.

       

      Abstract: K nearest neighbor (KNN) search is a very challenging research topic in many application areas, such as multimedia information retrieval and data mining. Lots of index structures have been proposed to solve this problem. However, the query performance based on tree-like index structures would decrease drastically with the increase of dimensionality. As a result, ‘the curse of dimensionality’ is brought about and well known in many index structures like R-Tree, X-Tree and SS-Tree, etc.. Researchers also proposed other methods like VA-File to reduce disk I/O cost by compressing data. However, approximate vectors in VA-File are not sorted or classified hierarchically. In this paper, a new index structure VAR-Tree is proposed, which combines VA-File and R-Tree, and employs R-Tree to manage the approximations. A KNN search algorithm is also presented to perform similarity search in a VAR-Tree. Experimental results show that VAR-Tree has a promising retrieval performance.

       

    /

    返回文章
    返回