Systematic generation of FPGA-based FFT implementations
In this paper, we propose a systemic approach for synthesizing field-programmable gate array (FPGA) implementations of fast Fourier transform (FFT) computations. Our approach considers both cost (in terms of FPGA resource requirements), and performance (in terms of throughput), and optimizes for both of these dimensions based on user-specified requirements. Our approach involves two orthogonal techniques-FFT inner loop unrolling and outer loop unrolling - to perform design space exploration in terms of cost and performance. By appropriately combining these two forms unrolling, we can achieve cost-optimized FFT implementations in terms of FPGA slices or block RAMs in FPGA, subject to the required throughput. We compared the results of our synthesis approach with a recently-introduced commercial FPGA intellectual property (IP) core - the FFT IP module in the Xilinx LogiCore Library, which provides different FFT implementations that are optimized for a limited set of performance levels. Our results demonstrate efficiency levels that are in some cases better than these commercial IP blocks. At the same time, our approach provides the advantages of being able to optimize implementations based on arbitrary, user-specified performance levels, and of being based on general formulations of FFT loop unrolling trade-offs, which can be retargeted to different kinds of FPGA devices.