• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

DSP芯片中的高能效FFT加速器

雷元武, 陈小文, 彭元喜

雷元武, 陈小文, 彭元喜. DSP芯片中的高能效FFT加速器[J]. 计算机研究与发展, 2016, 53(7): 1438-1446. DOI: 10.7544/issn1000-1239.2016.20160123
引用本文: 雷元武, 陈小文, 彭元喜. DSP芯片中的高能效FFT加速器[J]. 计算机研究与发展, 2016, 53(7): 1438-1446. DOI: 10.7544/issn1000-1239.2016.20160123
Lei Yuanwu, Chen Xiaowen, Peng Yuanxi. A High Energy Efficiency FFT Accelerator on DSP Chip[J]. Journal of Computer Research and Development, 2016, 53(7): 1438-1446. DOI: 10.7544/issn1000-1239.2016.20160123
Citation: Lei Yuanwu, Chen Xiaowen, Peng Yuanxi. A High Energy Efficiency FFT Accelerator on DSP Chip[J]. Journal of Computer Research and Development, 2016, 53(7): 1438-1446. DOI: 10.7544/issn1000-1239.2016.20160123
雷元武, 陈小文, 彭元喜. DSP芯片中的高能效FFT加速器[J]. 计算机研究与发展, 2016, 53(7): 1438-1446. CSTR: 32373.14.issn1000-1239.2016.20160123
引用本文: 雷元武, 陈小文, 彭元喜. DSP芯片中的高能效FFT加速器[J]. 计算机研究与发展, 2016, 53(7): 1438-1446. CSTR: 32373.14.issn1000-1239.2016.20160123
Lei Yuanwu, Chen Xiaowen, Peng Yuanxi. A High Energy Efficiency FFT Accelerator on DSP Chip[J]. Journal of Computer Research and Development, 2016, 53(7): 1438-1446. CSTR: 32373.14.issn1000-1239.2016.20160123
Citation: Lei Yuanwu, Chen Xiaowen, Peng Yuanxi. A High Energy Efficiency FFT Accelerator on DSP Chip[J]. Journal of Computer Research and Development, 2016, 53(7): 1438-1446. CSTR: 32373.14.issn1000-1239.2016.20160123

DSP芯片中的高能效FFT加速器

基金项目: 国家自然科学基金项目(61402499,61502508);湖南省自然科学基金项目(2015JJ3017)
详细信息
  • 中图分类号: TP302

A High Energy Efficiency FFT Accelerator on DSP Chip

  • 摘要: 快速傅里叶变换(fast Fourier transform, FFT)是数字信号处理(digital signal processing, DSP)领域中最耗时的核心算法,该算法的计算性能和计算效率将影响整个应用的执行效率.因此,在DSP芯片上设计实现了一个基于矩阵转置操作的高能效可变长度FFT加速器,采用多种并行策略开发批量小规模FFT算法与大规模Cooley-Tukey FFT算法中指令级和任务级并行.设计“乒乓”多体数据存储器,重叠数据搬移和FFT计算之间的开销,提高FFT加速器计算效率.并基于此存储器,提出基于基本块的快速矩阵转置算法,从而避免对数据矩阵的列访问;提出混合旋转因子产生策略,结合查表和基于CORDIC算法在线计算方式,最大限度降低旋转因子产生的硬件开销.实验结果表明:FFT加速器原型的峰值能效为146 GFLOPs/W,相比Intel Xeon CPU上的多线程FFTW实现,取得2个数量级的能效提升.
    Abstract: Fast Fourier transform (FFT) is a most time-consuming algorithm in the domain of digital signal processing (DSP). The performance and energy efficiency of FFT will make significant effect on different DSP applications. Thus, this paper presents a high energy efficiency variable-size FFT accelerator based on matrix transposition on DSP chip. Several parallel schemes are employed to exploit instruction level parallel and task level parallel of batch of small-size FFTs or big-size Cooley-Tukey FFT. A “Ping-Pong” structure of multi-bank data memory (MBDM) is presented to overlap the overhead of data move and FFT calculation. Moreover, based on MBDM, fast matrix transposition algorithm with basic block transposition is presented to avoid the matrix access with column-wise and improve the utilization of DDR bandwidth. Hybrid twiddle factor generating scheme, combining lookup table and on-line calculation with CORDIC, is presented to reduce the hardware for twiddle factor. Experimental results show that our FFT accelerator prototype with power efficiency of 146 GFLOPs/W, achieves energy efficiency improvement by about two orders of magnitude with multi-thread FFTW on Intel Xeon CPU.
计量
  • 文章访问数:  2267
  • HTML全文浏览量:  1
  • PDF下载量:  946
  • 被引次数: 0
出版历程
  • 发布日期:  2016-06-30

目录

    /

    返回文章
    返回