基于Intel平台的Winograd快速卷积算法研究与优化

武铮; 安虹; 金旭; 迟孟贤; 吕国锋; 文可; 周鑫

doi:10.7544/issn1000-1239.2019.20170932

基于Intel平台的Winograd快速卷积算法研究与优化

Research and Optimization of Fast Convolution Algorithm Winograd on Intel Platform

摘要

摘要: 随着深度学习的快速发展，其在语音处理、图像识别和自然语言理解等领域被广泛应用，为科研产业以及日常生活带去了巨大的变革.Intel紧跟深度学习的浪潮，推出了第2代Xeon Phi处理器KNL(knights landing)，其后又发布了第3代Xeon Phi处理器KNM(knights mill)，为深度学习的蓬勃发展带去了新的活力.通过在Intel平台上进行快速卷积算法Winograd的研究与优化，对比Intel MKL(math kernel library) DNN(deep neural network)中的卷积性能，推动Intel MKL DNN中深度神经网络接口的完善以及Intel平台上深度学习的发展.研究中结合Intel最新深度学习平台的AVX-512指令集、高速内存MCDRAM、多Memory/SNC模式、二维网格状内核结构等特性，并通过对内存分配、数据调度等情况的分析，设计优化Winograd算法，一方面选取典型的卷积神经网络(convolutional neural network, CNN)网络模型VGG19，测试对比Intel MKL DNN的卷积实现，最终取得了2倍多的性能加速比；另一方面，通过测试常用卷积类型，对比Intel MKL DNN和NVIDIA cuDNN，验证了实现的Winograd对于常用卷积类型具有很好的适用性且具有实际使用价值.该研究工作期望为Intel平台在深度学习领域的发展提供重要的指导意义.

Abstract: With the rapid development of deep learning, it’s applied extensively for many fields, such as speech processing, image recognition, natural language understanding and so on, bringing great changes for scientific research and daily life. Intel which follows the trend of deep learning launched the second generation of Xeon Phi processor Intel KNL(knights landing), and released the third generation Intel KNM (knights mill), which brings new impetus and vitality for the prosperous development of deep learning. This paper mainly contributes to promoting perfect Intel MKL (math kernel library) DNN (deep neural network), and develops deep learning on Intel platform, according to research and optimization for the fast convolution algorithm Winograd. Combined with characteristics of Intel latest deep learning platform, such as AVX512, high-speed memory MCDRAM, various memory/SNC modes, two-dimensional grid-type cores structure and so on, this work aims to design and optimize the implementation of Winograd algorithm by analyzing memory allocation, data dependency, etc. Finally, on one hand, the typical CNN (convolutional neural network) model VGG19 is used to test and compare performance with Intel MKL convolution, achieving more than doubled acceleration of performance. On the other hand, the common different types of convolutions are used to test and compare performance with Intel MKL DNN and NVIDIA cuDNN, verifying applicability and objective use value about Winograd. The purpose of the paper is to provide important guiding significance for development of Intel platform in the field of deep learning.

HTML全文

参考文献(0)

施引文献

资源附件(0)