ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2021, Vol. 58 ›› Issue (6): 1192-1203.doi: 10.7544/issn1000-1239.2021.20210069

所属专题: 2021计算机芯片关键技术前沿与进展专题

• 系统结构 • 上一篇    下一篇



  1. (江南计算技术研究所 江苏无锡 214083) (
  • 出版日期: 2021-06-01
  • 基金资助: 

A High Performance Accelerator Design for Ultra-Long Point Floating-Point FFT

Wang Di, Shi Song, Wu Tiebin, Liu Liang, Tan Hongbing, Hao Ziyu, Guo Feng, Li Hongliang   

  1. (Jiangnan Institute of Computing Technology, Wuxi, Jiangsu 214083)
  • Online: 2021-06-01
  • Supported by: 
    This work was supported by the National Science and Technology Major Projects of Hegaoji (2018ZX01028-102).

摘要: 快速傅里叶变换(fast Fourier transform, FFT)在数字信号处理中占据核心地位.随着高性能超长点数FFT需求的增长,数字信号处理器(digital signal processor, DSP)的计算能力越来越难以满足需求,集成FFT加速器成为重要的发展趋势.为了支持超长点数FFT,将2维分解算法推广到多维,提出一种可集成于DSP的高性能超长点数FFT加速器结构.该结构通过基于素数个存储体的无冲突体编址方法实现了3维转置运算;通过递推算法实现了高效铰链因子生成;使用单精度浮点二项融合点积运算和融合加-减运算,对FFT运算电路进行了精细化设计.实现了对4G点数单精度浮点FFT计算的支持.综合结果表明:FFT加速器运行频率能够达到1GHz以上,性能达到640Gflop/s.在支持的点数和性能方面都较已有研究成果取得大幅提升.

关键词: 快速傅里叶变换, 多维分解算法, 3维转置运算, 铰链因子生成, 加速器

Abstract: Fast Fourier transform (FFT) plays a key role in digital signal processing. With the increasing demand of high performance ultra-long point FFT, digital signal processor (DSP) is becoming more and more difficult to meet the demand, so integrated FFT accelerators have become an important development trend. In order to support ultra-long point FFT, this paper extends the two-dimensional decomposition algorithm of FFT to multi-dimensional, and we propose a high performance ultra-long point FFT accelerator architecture which can be integrated into DSP. In this architecture, three-dimensional transposition operation is realized by using collision-free addressing method with prime number memory banks; efficient twiddle factor generation is realized by recursive algorithm; FFT operation circuit is refined by using single precision floating-point fused dot product and fused add-subtract operation. Finally, this paper realizes the single precision floating-point FFT calculation within 4G points. The synthesis result shows that the proposed FFT accelerator can run at a frequency of more than 1GHz and its performance can reach 640Gflop/s, which has been greatly improved in terms of points and performance compared with the existing research.

Key words: fast Fourier transform (FFT), multi-dimensional decomposition algorithm, three-dimensional transposition operation, twiddle factor generation, accelerator