高级检索
    王谛, 石嵩, 吴铁彬, 刘亮, 谭弘兵, 郝子宇, 过锋, 李宏亮. 一种高性能超长点数浮点FFT加速器设计[J]. 计算机研究与发展, 2021, 58(6): 1192-1203. DOI: 10.7544/issn1000-1239.2021.20210069
    引用本文: 王谛, 石嵩, 吴铁彬, 刘亮, 谭弘兵, 郝子宇, 过锋, 李宏亮. 一种高性能超长点数浮点FFT加速器设计[J]. 计算机研究与发展, 2021, 58(6): 1192-1203. DOI: 10.7544/issn1000-1239.2021.20210069
    Wang Di, Shi Song, Wu Tiebin, Liu Liang, Tan Hongbing, Hao Ziyu, Guo Feng, Li Hongliang. A High Performance Accelerator Design for Ultra-Long Point Floating-Point FFT[J]. Journal of Computer Research and Development, 2021, 58(6): 1192-1203. DOI: 10.7544/issn1000-1239.2021.20210069
    Citation: Wang Di, Shi Song, Wu Tiebin, Liu Liang, Tan Hongbing, Hao Ziyu, Guo Feng, Li Hongliang. A High Performance Accelerator Design for Ultra-Long Point Floating-Point FFT[J]. Journal of Computer Research and Development, 2021, 58(6): 1192-1203. DOI: 10.7544/issn1000-1239.2021.20210069

    一种高性能超长点数浮点FFT加速器设计

    A High Performance Accelerator Design for Ultra-Long Point Floating-Point FFT

    • 摘要: 快速傅里叶变换(fast Fourier transform, FFT)在数字信号处理中占据核心地位.随着高性能超长点数FFT需求的增长,数字信号处理器(digital signal processor, DSP)的计算能力越来越难以满足需求,集成FFT加速器成为重要的发展趋势.为了支持超长点数FFT,将2维分解算法推广到多维,提出一种可集成于DSP的高性能超长点数FFT加速器结构.该结构通过基于素数个存储体的无冲突体编址方法实现了3维转置运算;通过递推算法实现了高效铰链因子生成;使用单精度浮点二项融合点积运算和融合加-减运算,对FFT运算电路进行了精细化设计.实现了对4G点数单精度浮点FFT计算的支持.综合结果表明:FFT加速器运行频率能够达到1GHz以上,性能达到640Gflop/s.在支持的点数和性能方面都较已有研究成果取得大幅提升.

       

      Abstract: Fast Fourier transform (FFT) plays a key role in digital signal processing. With the increasing demand of high performance ultra-long point FFT, digital signal processor (DSP) is becoming more and more difficult to meet the demand, so integrated FFT accelerators have become an important development trend. In order to support ultra-long point FFT, this paper extends the two-dimensional decomposition algorithm of FFT to multi-dimensional, and we propose a high performance ultra-long point FFT accelerator architecture which can be integrated into DSP. In this architecture, three-dimensional transposition operation is realized by using collision-free addressing method with prime number memory banks; efficient twiddle factor generation is realized by recursive algorithm; FFT operation circuit is refined by using single precision floating-point fused dot product and fused add-subtract operation. Finally, this paper realizes the single precision floating-point FFT calculation within 4G points. The synthesis result shows that the proposed FFT accelerator can run at a frequency of more than 1GHz and its performance can reach 640Gflop/s, which has been greatly improved in terms of points and performance compared with the existing research.

       

    /

    返回文章
    返回