高级检索

    一种基于Z曲线近似k-最近对查询算法

    An Approximate k-Closest Pair Query Algorithm Based on Z Curve

    • 摘要: k-最近对查询是空间数据库中重要操作之一.在低维空间中基于R*树分枝限界最近对查询算法(k-self-CPQ)和Brute-Force算法的查询效率较高,而在高维空间中其性能急剧恶化,降低空间维度成为解决问题的关键.依据Z曲线构造过程,将高维空间分割成大小相等的网格,以此将网格中的点映射到线性空间中.提出了基于网格划分的降维方法及最小网格概念,给出了基于Z曲线近似k-最近对查询算法.利用最小网格的边长,算法优化线性扫描过程.实验结果表明在高维空间中算法性能优于Brute-Fore和k-self-CPQ,且近似k-最近对质量较好.

       

      Abstract: The k-closest pairs query is one of the important operations of spatial database. The k-self-closest pair query algorithm based on R*-tree (k-self-CPQ) and brute-force method could achieve better performance in low-dimensional space, but their performances suffer greatly in high-dimensional space, so the reduction of the dimensionality is the key to the problem. Space-filling curve has been extensively used as a mapping scheme from high-dimensional space into linear space, and imposes a linear order of points in the space. It is like a thread that goes through all the points in the space. Hilbert curve, Gray curve, and Z curve are three important space-filling curves. The mapping of Z curve could apply to high-dimensional space easily. Based on Z curve, a method of the reduction of the dimensionality, a notion of minimum grid, and an approximate k-closest pair algorithm under the L_t-metric (t=1,…,∞) are presented. It uses multiple shifted copies (ZL-set) of the data point sorted according to their position along Z curve. Using the length of minimum grid, it optimizes the procedure of scanning ZL-set. The algorithm is efficient and simple to implement. Experimental results, obtained by using real and synthetic data sets in high-dimensional space, indicate that its performance is better than that of the k-self-CPQ and brute-force methods, and the quality of approximate k-closest pair is better than that of theoretical analysis.

       

    /

    返回文章
    返回