高级检索
    李茂文, 曲国远, 魏大洲, 贾海鹏. 面向GPU计算平台的神经网络卷积性能优化[J]. 计算机研究与发展, 2022, 59(6): 1181-1191. DOI: 10.7544/issn1000-1239.20200985
    引用本文: 李茂文, 曲国远, 魏大洲, 贾海鹏. 面向GPU计算平台的神经网络卷积性能优化[J]. 计算机研究与发展, 2022, 59(6): 1181-1191. DOI: 10.7544/issn1000-1239.20200985
    Li Maowen, Qu Guoyuan, Wei Dazhou, Jia Haipeng. Performance Optimization of Neural Network Convolution Based on GPU Platform[J]. Journal of Computer Research and Development, 2022, 59(6): 1181-1191. DOI: 10.7544/issn1000-1239.20200985
    Citation: Li Maowen, Qu Guoyuan, Wei Dazhou, Jia Haipeng. Performance Optimization of Neural Network Convolution Based on GPU Platform[J]. Journal of Computer Research and Development, 2022, 59(6): 1181-1191. DOI: 10.7544/issn1000-1239.20200985

    面向GPU计算平台的神经网络卷积性能优化

    Performance Optimization of Neural Network Convolution Based on GPU Platform

    • 摘要: 图像检测、识别任务已经被应用在越来越多的生产生活场景中,基于卷积神经网络的方法凭借着精度高的特点被广泛应用.但是卷积神经网络存在着权重参数多、对算力要求高的问题,算力有限且型号多样的边缘计算设备使得这些应用在使用中受限.在跨平台上运行高性能代码,以及基于GPU的卷积神经网络优化愈发重要.针对卷积神经网络中的卷积规模和其他通用矩阵乘(general matrix multiplication, GEMM)方法的不足,根据分块规模、分支执行、访存和计算比例,提出了一种针对卷积神经网络规模优化的GEMM优化方法,将其应用于Winograd算法,并结合算子合并,实现对卷积进一步优化.同时基于遍历的自调优选择性能最优的卷积算子,结合离线编译、内存池、16 b量化、网络规模裁剪等方法,来提升卷积神经网络的性能.最后在AMD V1605B平台上进行实验验证算法的效果,通过和其他GEMM算法以及深度学习网络的性能进行对比,验证了该方法能够获得比GEMM算法和Winograd算法更好的加速效果,并能有效地加速卷积神经网络.

       

      Abstract: Image detection and recognition tasks have been applied in more and more production and life scenarios. The convolution-based neural network method is widely used because of its high accuracy. However, the convolution neural network has the problems of many weight parameters and high computational requirements, which are limited by the limited computational power and the variety of edge computing devices. Running high-performance codes across platforms, convolutional neural network optimization based on GPU is increasingly important. In view of the insufficiency of convolution scale and other GEMM methods in convolutional neural network, we present a GEMM optimization method for convolutional neural network size optimization based on block size, branch execution, memory access and calculation scale, which can be applied to Wingrad algorithm and operator combination to further optimize convolution. At the same time, the convolution operator with the best performance is selected based on traversal self-tuning, combining offline compilation, memory pool, 16 b quantization, network scale clipping, etc. to improve the performance of convolutional neural network. Finally, experiments are carried out on AMD V1605B platform to verify the effectiveness of the algorithm. By comparing with other GEMM algorithms and deep learning networks, it is verified that this method can achieve better acceleration than GEMM and Winograd algorithms, and can effectively accelerate the convolutional neural network.

       

    /

    返回文章
    返回