Advanced Search
    Bao Guanghui, Zhang Zhaogong, Li Jianzhong, Xuan Ping. Novel MapReduce-Based Similarity Self-Join Method: Filter and In-Circle Algorithm[J]. Journal of Computer Research and Development, 2016, 53(12): 2847-2857. DOI: 10.7544/issn1000-1239.2016.20150794
    Citation: Bao Guanghui, Zhang Zhaogong, Li Jianzhong, Xuan Ping. Novel MapReduce-Based Similarity Self-Join Method: Filter and In-Circle Algorithm[J]. Journal of Computer Research and Development, 2016, 53(12): 2847-2857. DOI: 10.7544/issn1000-1239.2016.20150794

    Novel MapReduce-Based Similarity Self-Join Method: Filter and In-Circle Algorithm

    • Similarity self-join is a very important study in many applications. For the massive data sets, MapReduce can provide an effective distributed computing framework, in particular, similarity self-join can be applied on the framework. There are still problems, such as fine partition method, are applied to cluster data area for load balancing, but it is not easy to implement. Existing algorithms cant effectively accomplish similarity self-join operations for the massive data sets. In this paper, we propose two novel algorithms of similarity self-join on the MapReduce framework, and use coordinate-filtering techniques to get the valid candidate sets and use the in-circle method on the hexagon-based partition area. Those coordinate-filtering techniques are based on equal-width grid partition, and adopt the restriction that two points have more distances than two projective points in the same axis, and can drop obviously some candidate set. We also proof that the hexagon-based partition is the best form in all normal partition. Our experimental results demonstrate that the novel method has an advantage over the other join algorithms for cluster data area which improves efficiency over 80%. The algorithm can effectively solve the problem of the similarity self-join for the massive data in cluster data area.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return