• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Xu Kunhao, Nie Tiezheng, Shen Derong, Kou Yue, Yu Ge. Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture[J]. Journal of Computer Research and Development, 2021, 58(3): 598-608. DOI: 10.7544/issn1000-1239.2021.20190567
Citation: Xu Kunhao, Nie Tiezheng, Shen Derong, Kou Yue, Yu Ge. Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture[J]. Journal of Computer Research and Development, 2021, 58(3): 598-608. DOI: 10.7544/issn1000-1239.2021.20190567

Parallel String Similarity Join Approach Based on CPU-GPU Heterogeneous Architecture

Funds: This work was supported by the National Key Research and Development Program of China (2018YFB1003404) and the National Natural Science Foundation of China (U1811261, 61672142).
More Information
  • Published Date: February 28, 2021
  • Similarity join is an important task in data cleaning, data integration and other fields, which has attracted extensive attention in recent years. With the increasing amount of data, the improvement of real-time processing requirement and the bottleneck of CPU performance improvement, the traditional serial algorithms of similarity join have been unable to meet the requirement of current big data processing. As a co-processor, GPU has achieved good acceleration results in machine learning and other fields in recent years. It is of great practical significance to study the parallel similarity join algorithms based on GPU. This paper proposes a parallel similarity join algorithm based on CPU-GPU heterogeneous architecture. Firstly, GPU is used to construct inverted index based on SoA (struct of arrays), which solves the problem of low efficiency of traditional index structure in parallel reading and writing. Then, to address the performance problem of serial algorithms, a parallel dual-length filtering algorithm based on filter-verification framework is proposed. Inverted index and prefix filtering algorithm are used to further improve the filtering performance. And in our approach, the calculation for exact similarity verification is performed by CPU to make full use of heterogeneous computing resources of CPU-GPU. Finally, experiments are carried out on several datasets. Compared with the serial similarity join algorithms, the results show that our proposed algorithms have better filtering performance and lower index generation time than existing algorithms, and also have better processing performance and higher speedup ratio on the similarity join.
  • Related Articles

    [1]Zhang Jing, Ju Jialiang, Ren Yonggong. Double-Generators Network for Data-Free Knowledge Distillation[J]. Journal of Computer Research and Development, 2023, 60(7): 1615-1627. DOI: 10.7544/issn1000-1239.202220024
    [2]Cheng Haodong, Han Meng, Zhang Ni, Li Xiaojuan, Wang Le. Closed High Utility Itemsets Mining over Data Stream Based on Sliding Window Model[J]. Journal of Computer Research and Development, 2021, 58(11): 2500-2514. DOI: 10.7544/issn1000-1239.2021.20200554
    [3]Li Xuebing, Chen Yang, Zhou Mengying, Wang Xin. Internet Data Transfer Protocol QUIC: A Survey[J]. Journal of Computer Research and Development, 2020, 57(9): 1864-1876. DOI: 10.7544/issn1000-1239.2020.20190693
    [4]Liu Bingyi, Wu Libing, Jia Dongyao, Nie Lei, Ye Luyao, Wang Jianping. Data Uplink Strategy in Mobile Cloud Service Based Vehicular Ad Hoc Network[J]. Journal of Computer Research and Development, 2016, 53(4): 811-823. DOI: 10.7544/issn1000-1239.2016.20151150
    [5]Wang Qiang, Li Xiongfei, Wang Jing. A Data Placement and Task Scheduling Algorithm in Cloud Computing[J]. Journal of Computer Research and Development, 2014, 51(11): 2416-2426. DOI: 10.7544/issn1000-1239.2014.20130749
    [6]Zhang Peng, Wang Guiling, Xu Xuehui. A Data Placement Approach for Workflow in Cloud[J]. Journal of Computer Research and Development, 2013, 50(3): 636-647.
    [7]Han Donghong, Gong Pizhen, Xiao Chuan, Zhou Rui. Load Shedding Strategies on Sliding Window Joins over Data Streams[J]. Journal of Computer Research and Development, 2011, 48(1): 103-109.
    [8]Liu Xuejun, Xu Hongbing, Dong Yisheng, Qian Jiangbo, Wang Yongli. Mining Frequent Closed Patterns from a Sliding Window over Data Streams[J]. Journal of Computer Research and Development, 2006, 43(10): 1738-1743.
    [9]Jin Hai, Luo Fei, Zhang Qin, and Zhang Hao. An Efficient Data Transfer Protocol for P2P-Based High Performance Computing[J]. Journal of Computer Research and Development, 2006, 43(9): 1543-1549.
    [10]Qian Jiangbo, Xu Hongbing, Wang Yongli, Liu Xuejun, Dong Yisheng. Simultaneous Sliding Window Join Approach over Multiple Data Streams[J]. Journal of Computer Research and Development, 2005, 42(10): 1771-1778.

Catalog

    Article views (587) PDF downloads (289) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return