Abstract:
Three-dimensional fast Fourier transform(3D-FFT) is widely used in physics. It is crucial to many applications because it demands heavy calculation and communications. Thus in most cases it is 3D-FFT that dominates the computational time. The traditional parallel algorithms of 3D-FFT are not suitable for the sparse lattice which is often encountered in the field of quantum computing, because the block partitioning used may involve many redundant computing and communications, due to the sparse of non-zero elements in FFT grid. In this paper we propose a noval parallel algorithm of 3D-FFT. Unlike the previous methods, the new algorithm uses slice partitioning, and redesigns the computing order in order to minimize the calculation time and communication cost. Taking advantage of the slice partitioning, the new method are highly scalable and can automatically satisfy the demands of load balancing. We compare it with traditional algorithms in theory and in practice. Theoretical performance analysis shows that the new method can greatly reduce the computational time and increase parallel speedup. The experiments have been carried cut in some high-performance machines, such as KD-50, IBM JS22 and DAWNING. The results show that our new algorithm behaves much better than traditional algorithms in performing 3D-FFT for sparse lattice.