Abstract:
In recent years, although Chinese domestic field programmable gate array (FPGA) manufacturers have developed rapidly, they still face challenges when deploying FPGA heterogeneous accelerators in data centers. Compared to international manufacturers such as Xilinx (now AMD) and Intel, domestic manufacturers generally lack solutions for high-speed transmission between PCIe devices and hosts, especially in the field of high-performance direct memory access (DMA) controller design, where there are obvious shortcomings. To solve this problem, we designed and implemented a PCIe-based multi-channel chained DMA controller. By using an independent descriptor controller to manage each channel, sharing data movers, and reducing the consumption of FPGA logic resources, this design improves resource efficiency. The adoption of a chain structure for descriptor management reduces CPU interrupt pressure while meeting the requirements for continuous high-speed transmission between hosts and devices. An innovative architecture for asynchronous internal information pre-processing was developed, enabling data stream processing that significantly improves bandwidth utilization and transmission performance. Testing results show that under PCIe Gen3x8, the DMA bandwidth between the hosts and the domestic FPGA accelerator reaches 6.91 GBps (86% utilization rate), supporting up to 16 channels with channel balancing implementation. This design effectively enables large-scale deployment of domestic FPGA heterogeneous accelerators in data center scenarios.