• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Li Yan, Zhang Yunquan. An Automatic Performance Tuning Framework for FFT on Heterogenous Platforms[J]. Journal of Computer Research and Development, 2014, 51(3): 637-649.
Citation: Li Yan, Zhang Yunquan. An Automatic Performance Tuning Framework for FFT on Heterogenous Platforms[J]. Journal of Computer Research and Development, 2014, 51(3): 637-649.

An Automatic Performance Tuning Framework for FFT on Heterogenous Platforms

More Information
  • Published Date: March 14, 2014
  • The fast Fourier transform (FFT) is an important computational kernel in scientific and engineering computation which has broad applicability, especially in the field of signal processing, image processing and solving partial differential equation. In this paper, we propose an automatic performance tuning framework, called MPFFT (massively parallel FFT), which is well-suited to heterogeneous platforms such as GPU (graphic processing unit) and APU (accelerated processing unit). We employ two-stage adaptation methodology in two levels, namely installation time and runtime. At installation time, there is a code generator that could automatically generate FFT codelet for arbitrary size called by GPU kernel. The code generator could also generate high optimized code for GPU kernel according to auto-tuning techniques at runtime. Experimental results demonstrate that MPFFT substantially outperforms the clAmdFft library both on AMD GPU and APU. For 1D, 2D and 3D FFT, the average speedup of MPFFT compared with clAmdFft 1.6 achieves up to 3.45, 15.20, 4.47 on AMD APU A-360 and 1.75, 3.01, 1.69 on AMD HD7970. It also achieves comparable performance as the CUFFT library on NVIDIA GPU, and the overall performance is within 93% of CUFFT 4.1 on Tesla C2050, and the maximum speedup is 1.28.
  • Related Articles

    [1]Li Dongwen, Zhong Zhenyu, Sun Yufei, Shen Junyu, Ma Zizhi, Yu Chuanyue, Zhang Yuzhi. LingLong: A High-Quality Small-Scale Chinese Pre-trained Language Model[J]. Journal of Computer Research and Development, 2025, 62(3): 682-693. DOI: 10.7544/issn1000-1239.202330844
    [2]Cui Yuanning, Sun Zequn, Hu Wei. A Pre-trained Universal Knowledge Graph Reasoning Model Based on Rule Prompts[J]. Journal of Computer Research and Development, 2024, 61(8): 2030-2044. DOI: 10.7544/issn1000-1239.202440133
    [3]Chen Rui, Wang Zhanquan. Uni-LSDPM: A Unified Online Learning Session Dropout Prediction Model Based on Pre-Training[J]. Journal of Computer Research and Development, 2024, 61(2): 441-459. DOI: 10.7544/issn1000-1239.202220834
    [4]Zhang Naizhou, Cao Wei, Zhang Xiaojian, Li Shijun. Conversation Generation Based on Variational Attention Knowledge Selection and Pre-trained Language Model[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202440551
    [5]Wang Qi, Li Deyu, Zhai Yanhui, Zhang Shaoxia. Parameterized Fuzzy Decision Implication[J]. Journal of Computer Research and Development, 2022, 59(9): 2066-2074. DOI: 10.7544/issn1000-1239.20210539
    [6]Zhang Dongjie, Huang Longtao, Zhang Rong, Xue Hui, Lin Junyu, Lu Yao. Fake Review Detection Based on Joint Topic and Sentiment Pre-Training Model[J]. Journal of Computer Research and Development, 2021, 58(7): 1385-1394. DOI: 10.7544/issn1000-1239.2021.20200817
    [7]Zhang Chao, Li Deyu. Interval-Valued Hesitant Fuzzy Graphs Decision Making with Correlations and Prioritization Relationships[J]. Journal of Computer Research and Development, 2019, 56(11): 2438-2447. DOI: 10.7544/issn1000-1239.2019.20180314
    [8]Cheng Xiaoyang, Zhan Yongzhao, Mao Qirong, Zhan Zhicai. Video Semantic Analysis Based on Topographic Sparse Pre-Training CNN[J]. Journal of Computer Research and Development, 2018, 55(12): 2703-2714. DOI: 10.7544/issn1000-1239.2018.20170579
    [9]Wang Cong, Yuan Ying, Peng Sancheng, Wang Xingwei, Wang Cuirong, Wan Cong. Fair Virtual Network Embedding Algorithm with Topology Pre-Configuration[J]. Journal of Computer Research and Development, 2017, 54(1): 212-220. DOI: 10.7544/issn1000-1239.2017.20150785
    [10]Wang Jing, Wang Lili, and Li Shuai. Pre-Computed Radiance Transport All-Frequency Shadows Algorithm on GPU[J]. Journal of Computer Research and Development, 2006, 43(9): 1505-1510.

Catalog

    Article views (1189) PDF downloads (719) Cited by()

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return