Advanced Search
    Efficient and scalable 3D-FFT heterogeneous computing architectureJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550393
    Citation: Efficient and scalable 3D-FFT heterogeneous computing architectureJ. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202550393

    Efficient and scalable 3D-FFT heterogeneous computing architecture

    • The three-dimensional fast Fourier transform (3D-FFT) algorithm serves as a fundamental computational kernel for numerous high-performance computing (HPC) applications, where its efficient implementation critically determines the overall system performance. This paper proposes a heterogeneous computing architecture leveraging CPU-FPGA co-processing to address the performance bottlenecks in conventional 3D-FFT implementations, including inefficient memory access patterns, procedural redundancy, and limited parallelization capabilities. Our methodology introduces three key architectural innovations: (1) a hierarchical data management strategy enabling multi-channel data transmission with optimized bandwidth utilization, (2) a pipelined synchronization mechanism that overlaps matrix transposition with FFT computation phases, and (3) a scalable parallel computation architecture supporting up to 128 concurrent FFT processing units. Experimental evaluation demonstrates significant performance improvements of 3D-FFT for common scientific computing scenarios involving 64×64×64 and 128×128×128 matrices. Compared with CPU-only solutions, our architecture achieves computation time reductions of 69.8% and 57.5% for 64×64×64 and 128×128×128 matrices respectively. When compared with existing FPGA-accelerated 3D-FFT implementations, the proposed design exhibits 32.6% and 35.3% performance enhancements for the corresponding matrix sizes. These results validate our architecture's effectiveness in optimizing memory hierarchy utilization, improving task synchronization efficiency and enhancing computational parallelism. The solution provides a scalable and energy-efficient acceleration architecture for HPC systems requiring large-scale 3D-FFT computations, particularly in scenarios demanding high throughput and low latency.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return