ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (3): 518-532.doi: 10.7544/issn1000-1239.20210551

所属专题: 2022存储系统与智能处理专题

• 体系结构 • 上一篇    下一篇

基于近端策略优化的阻变存储硬件加速器自动量化

魏正1,张兴军1,卓志敏2,纪泽宇1,李泳昊1   

  1. 1(西安交通大学计算机科学与技术学院 西安 710049);2(北京电子工程总体研究所 北京 100854) (frank.wei@stu.xjtu.edu.cn)
  • 出版日期: 2022-03-07
  • 基金资助: 
    国家重点研发计划项目(2016YFB0200902)

PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator

Wei Zheng1, Zhang Xingjun1, Zhuo Zhimin2, Ji Zeyu1, Li Yonghao1   

  1. 1(School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049);2(Beijing Institute of Electronic System Engineering, Beijing 100854)
  • Online: 2022-03-07
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2016YFB0200902).

摘要: 卷积神经网络在诸多领域已经取得超出人类的成绩.但是,随着模型存储开销和计算复杂性的不断增加,限制处理单元和内存单元之间数据交换的“内存墙”问题阻碍了其在诸如边缘计算和物联网等资源受限环境中的部署.基于阻变存储的硬件加速器由于具有高集成度和低功耗等优势,被广泛应用于加速矩阵-向量乘运算,但是其不适合进行32 b浮点数计算,因此需要量化来降低数据精度.手工为每一层确定量化位宽非常耗时,近期的研究针对现场可编程门阵列(field programmable gate array, FPGA)平台使用基于深度确定性策略梯度(deep deterministic policy gradient, DDPG)的强化学习来进行自动量化,但需要将连续动作转换为离散动作,并通过逐层递减量化位宽来满足资源约束条件.基于此,提出基于近端策略优化(proximal policy optimization, PPO)算法的阻变存储硬件加速器自动量化,使用离散动作空间来避免动作空间转换步骤,设计新的奖励函数使PPO自动学习满足资源约束的最优量化策略,并给出软硬件设计改动以支持混合精度计算.实验结果表明:与粗粒度的量化相比,提出的方法可以减少20%~30%的硬件开销,而不引起模型准确度的过多损失.与其他自动量化相比,提出的方法搜索时间短,并且在相同的资源约束条件下可以进一步减少约4.2%的硬件开销.这为量化算法和硬件加速器的协同设计提供了参考.

关键词: 自动量化, 强化学习, 基于阻变存储的硬件加速器, 神经网络, 内存计算

Abstract: Convolutional neural networks have already exceeded the capabilities of humans in many fields. However, as the memory consumption and computational complexity of CNNs continue to increase, the “memory wall” problem, which constrains the data exchange between the processing unit and memory unit, impedes their deployments in resource-constrained environments, such as edge computing and the Internet of Things. ReRAM(resistive RAM)-based hardware accelerator has been widely applied in accelerating the computing of matrix-vector multiplication due to its advantages in terms of high density and low power, but are not adept for 32 b floating-point data computation, raising the demand for quantization to reduce the data precision. Manually determining the bitwidth for each layer is time-consuming, therefore, recent studies leverage DDPG(deep deterministic policy gradient) to perform automated quantization on FPGA(field programmable gate array) platform, but it needs to convert continuous actions into discrete actions and the resource constraints are achieved by manually decreasing the bitwidth of each layer. This paper proposes a PPO(proximal policy optimization)-based automated quantization for ReRAM-based hardware accelerator, which uses discrete action space to avoid the action space conversion step.We define a new reward function to enable the PPO agent to automatically learn the optimal quantization policy that meets the resource constraints, and give software-hardware modifications to support mixed-precision computing. Experimental results show that compared with coarse-grained quantization, the proposed method can reduce hardware cost by 20%~30% with negligible loss of accuracy. Compared with other automatic quantification, the proposed method has a shorter search time and can further reduce the hardware cost by about 4.2% under the same resource constraints. This provides insights for co-design of quantization algorithm and hardware accelerator.

Key words: automated quantization, reinforcement learning, ReRAM-based hardware accelerator, neural network, processing in memory

中图分类号: