Abstract:
The k-closest pairs query is one of the important operations of spatial database. The k-self-closest pair query algorithm based on R*-tree (k-self-CPQ) and brute-force method could achieve better performance in low-dimensional space, but their performances suffer greatly in high-dimensional space, so the reduction of the dimensionality is the key to the problem. Space-filling curve has been extensively used as a mapping scheme from high-dimensional space into linear space, and imposes a linear order of points in the space. It is like a thread that goes through all the points in the space. Hilbert curve, Gray curve, and Z curve are three important space-filling curves. The mapping of Z curve could apply to high-dimensional space easily. Based on Z curve, a method of the reduction of the dimensionality, a notion of minimum grid, and an approximate k-closest pair algorithm under the L_t-metric (t=1,…,∞) are presented. It uses multiple shifted copies (ZL-set) of the data point sorted according to their position along Z curve. Using the length of minimum grid, it optimizes the procedure of scanning ZL-set. The algorithm is efficient and simple to implement. Experimental results, obtained by using real and synthetic data sets in high-dimensional space, indicate that its performance is better than that of the k-self-CPQ and brute-force methods, and the quality of approximate k-closest pair is better than that of theoretical analysis.