• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xu Kunhao, Nie Tiezheng, Shen Derong, Kou Yue, Yu Ge. Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture[J]. Journal of Computer Research and Development, 2021, 58(3): 598-608. DOI: 10.7544/issn1000-1239.2021.20190567
Citation: Xu Kunhao, Nie Tiezheng, Shen Derong, Kou Yue, Yu Ge. Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture[J]. Journal of Computer Research and Development, 2021, 58(3): 598-608. DOI: 10.7544/issn1000-1239.2021.20190567

Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture

Funds: This work was supported by the National Key Research and Development Program of China (2018YFB1003404) and the National Natural Science Foundation of China (U1811261, 61672142).
More Information
  • Published Date: February 28, 2021
  • Similarity join is an important task in data cleaning, data integration and other fields, which has attracted extensive attention in recent years. With the increasing amount of data, the improvement of real-time processing requirement and the bottleneck of CPU performance improvement, the traditional serial algorithms of similarity join have been unable to meet the requirement of current big data processing. As a co-processor, GPU has achieved good acceleration results in machine learning and other fields in recent years. It is of great practical significance to study the parallel similarity join algorithms based on GPU. This paper proposes a parallel similarity join algorithm based on CPU-GPU heterogeneous architecture. Firstly, GPU is used to construct inverted index based on SoA (struct of arrays), which solves the problem of low efficiency of traditional index structure in parallel reading and writing. Then, to address the performance problem of serial algorithms, a parallel dual-length filtering algorithm based on filter-verification framework is proposed. Inverted index and prefix filtering algorithm are used to further improve the filtering performance. And in our approach, the calculation for exact similarity verification is performed by CPU to make full use of heterogeneous computing resources of CPU-GPU. Finally, experiments are carried out on several datasets. Compared with the serial similarity join algorithms, the results show that our proposed algorithms have better filtering performance and lower index generation time than existing algorithms, and also have better processing performance and higher speedup ratio on the similarity join.
  • Related Articles

    [1]Feng Yuhong, Wu Kunhan, Huang Zhihong, Feng Yangzhou, Chen Huanhuan, Bai Jiancong, Ming Zhong. A Set Similarity Self-Join Algorithm with FP-tree and MapReduce[J]. Journal of Computer Research and Development, 2023, 60(12): 2890-2906. DOI: 10.7544/issn1000-1239.202111239
    [2]Xiao Zhongzheng, Chen Ningjiang, Jia Jionghao, Zhang Wenbo. A Dynamic Replica Management Mechanism Based on File Support Degree[J]. Journal of Computer Research and Development, 2016, 53(2): 431-442. DOI: 10.7544/issn1000-1239.2016.20148327
    [3]Wang Xianghai, Wei Tingting, Zhou Zhiguang, Song Chuanming. Research of Remote Sensing Image Fusion Method Based on the Contourlet Coefficients' Correlativity[J]. Journal of Computer Research and Development, 2013, 50(8): 1778-1786.
    [4]Xiong Gangqiang, Yu Jiande, Xiong Changzhen, Qi Dongxu. Reversible Factorization of U Orthogonal Transform and Image Lossless Coding[J]. Journal of Computer Research and Development, 2012, 49(4): 856-863.
    [5]Wang Junwen, Liu Guangjie, Dai Yuewei, Zhang Zhan, and Wang Zhiquan. Image Forensics for Blur Detection Based on Nonsubsampled Contourlet Transform[J]. Journal of Computer Research and Development, 2009, 46(9): 1549-1555.
    [6]Zhao Xiaoming, Ye Xijian. A New Approach to Ridgelet Transform[J]. Journal of Computer Research and Development, 2008, 45(5): 915-922.
    [7]Wen Guihua. Relative Transformation for Machine Learning[J]. Journal of Computer Research and Development, 2008, 45(4): 612-618.
    [8]Liao Bin, He Fazhi, and Jing Shuxu. Survey of Operational Transformation Algorithms in Real-Time Computer-Supported Cooperative Work[J]. Journal of Computer Research and Development, 2007, 44(2): 326-333.
    [9]Chen Tao, Yi Mo, Liu Zhongxuan, and Peng Silong. Image Fusion at Similar Scale[J]. Journal of Computer Research and Development, 2005, 42(12): 2126-2130.
    [10]Long Gang, Xiao Lei, and Chen Xuequan. Overview of the Applications of Curvelet Transform in Image Processing[J]. Journal of Computer Research and Development, 2005, 42(8): 1331-1337.

Catalog

    Article views (586) PDF downloads (289) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return