Image detection and recognition tasks have been applied in more and more production and life scenarios. The convolution-based neural network method is widely used because of its high accuracy. However, the convolution neural network has the problems of many weight parameters and high computational requirements, which are limited by the limited computational power and the variety of edge computing devices. Running high-performance codes across platforms, convolutional neural network optimization based on GPU is increasingly important. In view of the insufficiency of convolution scale and other GEMM methods in convolutional neural network, we present a GEMM optimization method for convolutional neural network size optimization based on block size, branch execution, memory access and calculation scale, which can be applied to Wingrad algorithm and operator combination to further optimize convolution. At the same time, the convolution operator with the best performance is selected based on traversal self-tuning, combining offline compilation, memory pool, 16 b quantization, network scale clipping, etc. to improve the performance of convolutional neural network. Finally, experiments are carried out on AMD V1605B platform to verify the effectiveness of the algorithm. By comparing with other GEMM algorithms and deep learning networks, it is verified that this method can achieve better acceleration than GEMM and Winograd algorithms, and can effectively accelerate the convolutional neural network.