Optimizing Winograd-Based Fast Convolution Algorithm on Phytium Multi-Core CPUs

Wang Qinglin; Li Dongsheng; Mei Songzhu; Lai Zhiquan; Dou Yong

doi:10.7544/issn1000-1239.2020.20200107

Journal of Computer Research and Development > 2020 > 57(6): 1140-1151. > DOI: 10.7544/issn1000-1239.2020.20200107

Wang Qinglin, Li Dongsheng, Mei Songzhu, Lai Zhiquan, Dou Yong. Optimizing Winograd-Based Fast Convolution Algorithm on Phytium Multi-Core CPUs[J]. Journal of Computer Research and Development, 2020, 57(6): 1140-1151. DOI: 10.7544/issn1000-1239.2020.20200107

Citation:

PDF (2410 KB)

Optimizing Winograd-Based Fast Convolution Algorithm on Phytium Multi-Core CPUs

(Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology, Changsha 410073) (College of Computer, National University of Defense Technology, Changsha 410073)

Funds: This work was supported by the National Science and Technology Major Projects of Hegaoji (2018ZX01028101).

More Information

Published Date: May 31, 2020

Graphical Abstract

Abstract

Abstract

Convolutional neural networks (CNNs) have been extensively used in artificial intelligence fields such as computer vision and natural language processing. Winograd-based fast convolution algorithms can effectively reduce the computational complexity of convolution operations in CNNs so that they have attracted great attention. With the application of Phytium multi-core CPUs independently developed by the National University of Defense Technology in artificial intelligence fields, there is strong demand of high-performance convolution primitives for Phytium multi-core CPUs. This paper proposes a new high-performance parallel Winograd-based fast convolution algorithm after studying architecture characteristics of Phytium multi-core CPUs and computing characteristics of Winograd-based fast convolution algorithms. The new parallel algorithm does not rely on general matrix multiplication routines, and consists of four stages: kernels transformation, input feature maps transformation, element-wise multiplication, and output feature maps inverse transformation. The data movements in all four stages have been collaboratively optimized to improve memory access performance of the algorithm. The custom data layouts, multi-level parallel data transformation algorithms and multi-level parallel matrix multiplication algorithm have also been proposed to support the optimization above efficiently. The algorithm is tested on two Phytium multi-core CPUs. Compared with Winograd-based fast convolution implementations in ARM Computer Library (ACL) and NNPACK, the algorithm can achieve speedup of 1.05~16.11 times and 1.66~16.90 times, respectively. The application of the algorithm in the open source framework Mxnet improves the forward-propagation performance of the VGG16 network by 3.01~6.79 times.
- multi-core CPUs,
- deep learning,
- convolutional neural networks,
- Winograd algorithms,
- parallel algorithms

FullText(HTML)

References (0)

[1]	Su Jindian, Ouyang Zhifan, Yu Shanshan. Aspect-Level Sentiment Classification for Sentences Based on Dependency Tree and Distance Attention[J]. Journal of Computer Research and Development, 2019, 56(8): 1731-1745. DOI: 10.7544/issn1000-1239.2019.20190102
[2]	Chen Long, Guan Ziyu, He Jinhong, Peng Jinye. A Survey on Sentiment Classification[J]. Journal of Computer Research and Development, 2017, 54(6): 1150-1170. DOI: 10.7544/issn1000-1239.2017.20160807
[3]	Zhang Zhifei, Miao Duoqian, Nie Jianyun, Yue Xiaodong. Sentiment Uncertainty Measure and Classification of Negative Sentences[J]. Journal of Computer Research and Development, 2015, 52(8): 1806-1816. DOI: 10.7544/issn1000-1239.2015.20150253
[4]	Zhao Chuanjun, Wang Suge, Li Deyu, Li Xin. Cross-Domain Text Sentiment Classification Based on Grouping-AdaBoost Ensemble[J]. Journal of Computer Research and Development, 2015, 52(3): 629-638. DOI: 10.7544/issn1000-1239.2015.20140156
[5]	Hou Yongshuai, Zhang Yaoyun, Wang Xiaolong, Chen Qingcai, Wang Yuliang, and Hu Baotian. Recognition and Retrieval of Time-sensitive Question in Chinese QA System[J]. Journal of Computer Research and Development, 2013, 50(12): 2612-2620.
[6]	Li Suke and Jiang Yanbing. Semi-Supervised Sentiment Classification Based on Sentiment Feature Clustering[J]. Journal of Computer Research and Development, 2013, 50(12): 2570-2577.
[7]	Wu Qiong, Liu Yue, Shen Huawei, Zhang Jin, Xu Hongbo, and Cheng Xueqi. A Unified Framework for Cross-Domain Sentiment Classification[J]. Journal of Computer Research and Development, 2013, 50(8): 1683-1689.
[8]	Lin Zheng, Tan Songbo, Cheng Xueqi. Sentiment Classification Analysis Based on Extraction of Sentiment Key Sentence[J]. Journal of Computer Research and Development, 2012, 49(11): 2376-2382.
[9]	Wang Suge, Li Deyu, Wei Yingjie. A Method of Text Sentiment Classification Based on Weighted Rough Membership[J]. Journal of Computer Research and Development, 2011, 48(5): 855-861.
[10]	Hu Yi, Lu Ruzhan, Li Xuening, Duan Jianyong, ChenYuquan. Research on Language Modeling Based Sentiment Classification of Text[J]. Journal of Computer Research and Development, 2007, 44(9): 1469-1475.

Cited By

Cited by

Periodical cited type(20)

1.	韦修喜，彭茂松，黄华娟. 基于多策略改进蝴蝶优化算法的无线传感网络节点覆盖优化. 计算机应用. 2024(04): 1009-1017 .
2.	刘超敏，胡玉平. 基于VGG—19和卡尔曼预处理的WSNs测距方法. 传感器与微系统. 2023(10): 139-142 .
3.	刘松旭，张大鹏，乌云娜，刘鹏. 基于RSSI模型的无线传感器网络定位算法. 计算机仿真. 2022(01): 427-431 .
4.	崔焕庆，张娜，罗汉江. 基于改进鸽群算法的无线传感器网络定位方法. 传感技术学报. 2022(03): 399-404 .
5.	陈岩，高振国，王海军，欧阳云，缑锦 . 隐私保护能力可调的节点定位协议. 计算机研究与发展. 2022(09): 2075-2088 . 本站查看
6.	刘琳岚，肖庭忠，舒坚，牛明晓. 基于门控循环单元的链路质量预测. 工程科学与技术. 2022(06): 51-58 .
7.	赵高丽，宋军平. 水下传感器网络自组织连通恢复仿真. 计算机仿真. 2021(03): 152-156 .
8.	刘恒，钟俊，刘辉. 基于优化核极限学习的WSN网络汇聚节点故障诊断. 新乡学院学报. 2021(06): 28-32 .
9.	石秦峰，徐祥涛，杨晓东. 基于节点汇聚链路模型的光纤传感器物联网节点控制. 激光杂志. 2021(07): 109-113 .
10.	张晶，罗施章，付谱平. 基于虚拟力移动锚节点的3D-DVHop-ACR定位算法. 控制与决策. 2021(10): 2409-2417 .
11.	张盛安，周洋，方浩，孙玉洁. 贵州电网贵阳供电局网络资源敏捷定位关键问题设计. 电力大数据. 2021(05): 79-85 .
12.	王礼霞，邰清清. 基于高阶马尔可夫链的无线传感器网络异常节点检测. 黑龙江工业学院学报(综合版). 2021(08): 93-97 .
13.	宰红斌，刘建国，唐保国，马建国，上官明霞，单荣荣. 基于WSN的输电线路状态监测与数据采集跨层优化方法. 电气工程学报. 2021(03): 161-169 .
14.	郑岚. 多信道通信网络环境下基于节点组簇技术通信资源调度算法. 山西能源学院学报. 2021(05): 97-99 .
15.	徐逸夫，段隆振. 基于蛙跳算法的无线传感器网络节点重部署. 计算机仿真. 2021(10): 328-332 .
16.	宋亚磊. 基于虚拟引力约束的光纤传感器网络节点空洞智能修复算法研究. 传感技术学报. 2021(10): 1395-1400 .
17.	易柏言. 关于无线传感器网络的时间同步技术探究. 科技创新与应用. 2020(15): 152-153 .
18.	王林，刘盼. 基于卷积神经网络的行人目标检测系统设计. 计算机测量与控制. 2020(07): 64-68+96 .
19.	左伟伟. 基于微积分算子的网络节点发包概率分布研究. 电子设计工程. 2020(23): 116-119+124 .
20.	李庐，赵晓峰. 基于拓扑感知映射算法的传感器网络数据稳定传输方法. 湖南科技学院学报. 2020(05): 54-57 .