Abstract:
Proposed in this paper is a novel k-nearest neighbor query algorithm based on data grid, called the GkNN. Three steps are made in the GkNN. First, when user submits a query vector and k, the vector reduction is performed using DDM index. Then the candidate vectors are transferred to the execution nodes by using vector package technique. Furthermore, the refinement process is conducted in parallelism to get the answer set of the candidate vectors. Finally, the answer set is transferred to the query node. The proposed algorithm uses vector reduction algorithm, vector package technique and pipelined parallelism to solve the problem of heterogeneity of network bandwidth between nodes on the data grid. The analysis and experimental results show that the performance of the algorithm is good in minimizing the response time by decreasing network transmission cost and increasing parallelism of I/O and CPU.