Xie Kunpeng, Lu Ye, Jin Zongming, Liu Yiqing, Gong Cheng, Chen Xinwei, Li Tao. FAQ-CNN: A Flexible Acceleration Framework for Quantized Convolutional Neural Networks on Embedded FPGAs[J]. Journal of Computer Research and Development, 2022, 59(7): 1409-1427. DOI: 10.7544/issn1000-1239.20210142
Citation:
Xie Kunpeng, Lu Ye, Jin Zongming, Liu Yiqing, Gong Cheng, Chen Xinwei, Li Tao. FAQ-CNN: A Flexible Acceleration Framework for Quantized Convolutional Neural Networks on Embedded FPGAs[J]. Journal of Computer Research and Development, 2022, 59(7): 1409-1427. DOI: 10.7544/issn1000-1239.20210142
Xie Kunpeng, Lu Ye, Jin Zongming, Liu Yiqing, Gong Cheng, Chen Xinwei, Li Tao. FAQ-CNN: A Flexible Acceleration Framework for Quantized Convolutional Neural Networks on Embedded FPGAs[J]. Journal of Computer Research and Development, 2022, 59(7): 1409-1427. DOI: 10.7544/issn1000-1239.20210142
Citation:
Xie Kunpeng, Lu Ye, Jin Zongming, Liu Yiqing, Gong Cheng, Chen Xinwei, Li Tao. FAQ-CNN: A Flexible Acceleration Framework for Quantized Convolutional Neural Networks on Embedded FPGAs[J]. Journal of Computer Research and Development, 2022, 59(7): 1409-1427. DOI: 10.7544/issn1000-1239.20210142
1(College of Computer Science, Nankai University, Tianjin 300350)
2(Tianjin Key Laboratory of Network and Data Security Technology(Nankai University), Tianjin 300350)
3(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190)
4(Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University), Fuzhou 350108)
Funds: This work was supported by the National Key Research and Development Program of China (2018YFB2100304), the National Natural Science Foundation of China (62002175), the Open Project Fund of State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences) (CARCHB202016), the Special Funding for Excellent Enterprise Technology Correspondent of Tianjin (21YDTPJC00380), the Open Project of Fujian Provincial Key Laboratory of Information Processing and Intelligent Control (Minjiang University) (MJUKF-IPIC202105), and the Innovation Fund of Chinese Universities Industry-University-Research (2020HYA01003).
Quantization can compress convolutional neural network (CNN) model size and improve computing efficiency. However, the existing accelerator designs for CNN quantization are usually faced with the challenges of various algorithms, poor reusability of code modules, low efficiency of data exchange and insufficient utilization of resources, and so on. To meet these challenges, we propose a flexible acceleration framework for the quantized CNNs named FAQ-CNN to optimize accelerator design from three aspects of computing, communication and storage. FAQ-CNN can support rapid deployment of quantized CNN model in the form of software tools. Firstly, a component for quantization algorithms is designed to separate the calculation part from the process of value projection in quantization algorithm; the optimization techniques such as operator fusion, double buffering and pipeline are also utilized to improve the execution efficiency of CNN inference task in parallel. Then, the hierarchical and bitwidth-independent encoding and parallel decoding method are both proposed to efficiently support batch transmission and parallel computing for low bitwidth data. Finally, the resource allocation optimization model which can be transformed into an integer nonlinear programming problem is established for FAQ-CNN; the heuristic pruning strategy is used to reduce design space size. The extensive experimental results show that FAQ-CNN can support almost all kinds of quantized CNN accelerators efficiently and flexibly. When the activation and weight value are set to 16 b, the computing performance of FAQ-CNN accelerator is 1.4 times that of the Caffeine. When 8 b configuration is applied, FAQ-CNN can achieve the superior performance by 1.23TOPS.