ISSN 1000-1239 CN 11-1777/TP

• 人工智能 •

1. (中国科学技术大学 合肥 230022) (zhengwu@mail.ustc.edu.cn)
• 出版日期: 2019-04-01
• 基金资助:
国家自然科学基金项目(61520106005，61761136014)；国家重点研发计划项目(2017YFB1010000)

### Research and Optimization of Fast Convolution Algorithm Winograd on Intel Platform

Wu Zheng, An Hong, Jin Xu, Chi Mengxian, Lü Guofeng, Wen Ke, Zhou Xin

1. (University of Science and Technology of China, Hefei 230022)
• Online: 2019-04-01

Abstract: With the rapid development of deep learning, it’s applied extensively for many fields, such as speech processing, image recognition, natural language understanding and so on, bringing great changes for scientific research and daily life. Intel which follows the trend of deep learning launched the second generation of Xeon Phi processor Intel KNL(knights landing), and released the third generation Intel KNM (knights mill), which brings new impetus and vitality for the prosperous development of deep learning. This paper mainly contributes to promoting perfect Intel MKL (math kernel library) DNN (deep neural network), and develops deep learning on Intel platform, according to research and optimization for the fast convolution algorithm Winograd. Combined with characteristics of Intel latest deep learning platform, such as AVX512, high-speed memory MCDRAM, various memory/SNC modes, two-dimensional grid-type cores structure and so on, this work aims to design and optimize the implementation of Winograd algorithm by analyzing memory allocation, data dependency, etc. Finally, on one hand, the typical CNN (convolutional neural network) model VGG19 is used to test and compare performance with Intel MKL convolution, achieving more than doubled acceleration of performance. On the other hand, the common different types of convolutions are used to test and compare performance with Intel MKL DNN and NVIDIA cuDNN, verifying applicability and objective use value about Winograd. The purpose of the paper is to provide important guiding significance for development of Intel platform in the field of deep learning.