面向边缘计算的嵌入式FPGA卷积神经网络构建方法

卢冶; 陈瑶; 李涛; 蔡瑞初; 宫晓利

doi:10.7544/issn1000-1239.2018.20170715

面向边缘计算的嵌入式FPGA卷积神经网络构建方法

卢冶¹,
陈瑶^2,4,
李涛^1,3,
蔡瑞初²,
宫晓利^1,3

¹(南开大学计算机与控制工程学院天津 300350)
²(广东工业大学计算机学院广州 510006)
³(计算机体系结构国家重点实验室(中国科学院计算技术研究所) 北京 100190)
⁴(新加坡高等数字科学研究中心新加坡 138632) (luye@nankai.edu.cn)

基金项目: 国家自然科学基金项目(61702286)；天津市自然科学基金项目(14JCQNJC00700,16ICYIC15200)；计算机体系结构国家重点实验室开放课题(CARCH201504,CARCH201604)；天津市大数据与云计算重大专项(15ZXDSGX00020)；福建省信息处理与智能控制重点实验室开放课题(MJUKF201733)；天津市优秀企业科技特派员项目(17JCTPJC49500)

详细信息

中图分类号: TP391
计量
- 文章访问数: 2559
- HTML全文浏览量: 16
- PDF下载量: 1767
出版历程
- 发布日期: 2018-02-28

Convolutional Neural Network Construction Method for Embedded FPGAs Oriented Edge Computing

Lu Ye¹,
Chen Yao^2,4,
Li Tao^1,3,
Cai Ruichu²,
Gong Xiaoli^1,3

¹(College of Computer and Control Engineering, Nankai University, Tianjin 300350)
²(School of Computers, Guangdong University of Technology, Guangzhou 510006)
³(State Key Laboratory of Computer Architecture (Institute of Computing Technology, Chinese Academy of Sciences), Beijing 100190)
⁴(Advanced Digital Sciences Center, Singapore 138632)

摘要

摘要: 当前，高计算消耗的应用和服务逐渐从集中式云计算中心向网络边缘的嵌入式环境迁移，FPGA因其灵活性和高能效特性，使其在边缘计算的嵌入式系统中得到广泛的应用.传统的FPGA卷积神经网络构造方法存在设计周期长和优化空间小等缺点，无法有效探索硬件加速器的设计空间，在网络边缘的的嵌入式环境下尤为明显.针对该问题，提出一种面向边缘计算的嵌入式FPGA平台卷积神经网络通用的构建方法.通过设计卷积神经网络函数中的网络层间可复用的加速器核心，以少量硬件资源实现性能优化的卷积神经网络硬件；通过拓展设计、缓存优化及数据流优化等技术，实现HLS设计优化；利用该方法在嵌入式FPGA平台上构建相应卷积神经网络，实验结果表明:优化后的网络模型在与Xeon E5-1620 CPU和GTX Titan GPU相比时，在功耗与性能方面具有一定优势，适合应用于边缘计算环境中.
- 边缘计算 /
- 卷积神经网路 /
- FPGA /
- 高层次综合 /
- 加速器核心
Abstract: At present, applications and services with high computational consumption migrate gradually from centralized cloud computing center to embedded environment in the network edge. FPGA is widely used in the embedded systems under edge computing because of its flexibility and high efficiency. The conventional FPGA based convolutional neural network construction method has shortcomings, such as long design cycle and small optimization space, which leads to an ineffective exploration of the design space of targeted hardware accelerator, especially in network edge embedded environment. In order to overcome these issues, a high level synthesis based general method for convolutional neural network construction on embedded FPGA oriented edge computing is proposed. The highly reusable accelerator function is designed to construct the optimized convolutional neural network with a lower hardware resource consumption. Scalable design methodology, memory optimization and data flow enhancement are implemented on the accelerator core with HLS design strategy. The convolutional neural network is built on embedded FPGA platforms. The results show the advantage of performance and power when compared with Xeon E5-1620 CPU and GTX K80 GPU, and suitable for edge computing environment.
- edge computing /
- convolutional neural network /
- FPGA /
- high level synthesis /
- accelerator core