Abstract:
When deploying models on resource-constrained FPGAs, traditional convolutional neural network accelerators and inference frameworks often face challenges such as various device types, extremely limited resources, insufficient data bandwidth utilization, complex operator types that are difficult to match operators and schedule computing task reasonably. In this paper, a sparse acceleration framework of convolutional neural network (SAF-CNN) for embedded FPGA is proposed. Through the method of software and hardware co-design, SAF-CNN is jointly optimized from the two perspectives of hardware accelerator design and software inference framework. SAF-CNN first constructs parallel computing array and designs parallel encoding and decoding scheme to realize single-period multi-data transmission and effectively reduce communication costs. Secondly, a fine-grained structured block partitioning pruning algorithm is designed to obtain a sparse and regular weight matrix by cutting the input channel dimension within the block, so as to significantly reduce the computation scale and the resource utilization of DSP multiplier. Then, the input channel dimension dynamic expansion method and runtime scheduling strategy compatible with depth-separable convolution is proposed to realize flexible adaptation of input channel parameters and resource reuse of point-wise convolution and depth-wise convolution. Finally, the computational graph reconstruction method and hardware operator fusion are used to improve the hardware execution efficiency. The experiments use two resource-limited low-end FPGA heterogeneous platforms, Intel CycloneV and Xilinx ZU3EG. The results show that the SAF-CNN accelerator can achieve the computational performance of 76.3GOPS and 494.3GOPS respectively. Compared with multi-core CPU, SAF-CNN can achieve 3.5x and 2.2x performance improvement on the object detection model of SSD_MobileNetV1, and the model inference speed is up to 26.5fps.