ISSN 1000-1239 CN 11-1777/TP

• Paper •

An Approximate k-Closest Pair Query Algorithm Based on Z Curve

Xu Hongbo1 and Hao Zhongxiao1,2,3

1. 1(College of Computer Science and Technology, Harbin University of Science and Technology, Harbin 150080) 2(Department of Computer Science and Technology, Qiqihar University, Qiqihar 161006) 3(College of Computer Science and Technology, Harbin Institute of Technology, Harbin 150001)
• Online:2008-02-15

Abstract: The k-closest pairs query is one of the important operations of spatial database. The k-self-closest pair query algorithm based on R*-tree (k-self-CPQ) and brute-force method could achieve better performance in low-dimensional space, but their performances suffer greatly in high-dimensional space, so the reduction of the dimensionality is the key to the problem. Space-filling curve has been extensively used as a mapping scheme from high-dimensional space into linear space, and imposes a linear order of points in the space. It is like a thread that goes through all the points in the space. Hilbert curve, Gray curve, and Z curve are three important space-filling curves. The mapping of Z curve could apply to high-dimensional space easily. Based on Z curve, a method of the reduction of the dimensionality, a notion of minimum grid, and an approximate k-closest pair algorithm under the L_t-metric (t=1,…,∞) are presented. It uses multiple shifted copies (ZL-set) of the data point sorted according to their position along Z curve. Using the length of minimum grid, it optimizes the procedure of scanning ZL-set. The algorithm is efficient and simple to implement. Experimental results, obtained by using real and synthetic data sets in high-dimensional space, indicate that its performance is better than that of the k-self-CPQ and brute-force methods, and the quality of approximate k-closest pair is better than that of theoretical analysis.