Abstract:
For the design of FFT processor of massive data with length N≠2\+m, zeropadding technique always results in massive wastes of hardware resource, and reduces the processor performance significantly. Presented in this paper is a new FFT algorithm for length N=q×2\+m, where q is a prime integer except for 2. Under the condition of a wider range of choices on different sequence lengths, comparisons with previously reported algorithms show that this algorithm not only makes substantial savings on arithmetic, but also has more regular computational structure which is of great benefit to hardware implementation of the algorithm. Based on this algorithm, a design method of parallel architecture FFT processor is also presented. The dedicated parallel memory mapping algorithm with the feature of minimal memory size relies on the inplace calculation property of the PCW FFT algorithm, and can simultaneously access to q×4 or 4×q data. A hybrid arithmetic unit unifies the arithmetic structures of mixed radix FFT and WFTA of qpoint. Compared with separate arithmetic units, this unit can save hardware resource substantially, which makes more arithmetic units in one processor possible. The hardware simulation of the FFT processor shows that a complex 20480point DFT operation can be done within 7181 clock cycles under the max clock frequency 105MHz.