ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2017, Vol. 54 ›› Issue (7): 1592-1602.doi: 10.7544/issn1000-1239.2017.20160558

Previous Articles     Next Articles

A Multi-Way Spatial Join Querying Processing Algorithm Based on Spark

Qiao Baiyou1,2, Zhu Junhai1, Zheng Yujie1, Shen Muchuan1, Wang Guoren1   

  1. 1(School of Computer Science and Engineering, Northeastern University, Shenyang 110819);2(Department of Computer Science, Brigham Young University, Provo, Utah, USA 84602)
  • Online:2017-07-01

Abstract: Aiming at the problem of spatial join query processing in cloud computing systems, a multi-way spatial join query processing algorithm BSMWSJ is proposed, which is based on Spark platform. In this algorithm, the whole data space is divided into grid cells with the same size by grid partition method, and spatial objects in each type data set are distributed into these grid cells according to their spatial locations. Spatial objects in different grid cells are processed in parallel. In multi-way spatial join query processing, a boundary filtering method is proposed to filter the useless data, which calculates the MBRs of the candidate results generated by the previous join processing, and uses these MBRs to filter the subsequent join data sets. This allows it to filter out the useless spatial objects, and reduce the redundant projection and replication of spatial objects. At the same time, a duplication avoidance strategy is applied to reduce the outputs of redundant results, and further minimizes the cost of the subsequent join processing. Many experiments on synthetic and real data sets show that the proposed multi-way spatial join query processing algorithm BSMWSJ has obvious advantages and better performance than the existing multi-way spatial join query processing algorithms.

Key words: cloud computing, Spark platform, multi-way spatial join query, boundary filtering, duplication avoidance

CLC Number: