ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2022, Vol. 59 ›› Issue (3): 518-532.doi: 10.7544/issn1000-1239.20210551

Special Issue: 2022存储系统与智能处理专题

Previous Articles     Next Articles

PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator

Wei Zheng1, Zhang Xingjun1, Zhuo Zhimin2, Ji Zeyu1, Li Yonghao1   

  1. 1(School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049);2(Beijing Institute of Electronic System Engineering, Beijing 100854)
  • Online:2022-03-07
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2016YFB0200902).

Abstract: Convolutional neural networks have already exceeded the capabilities of humans in many fields. However, as the memory consumption and computational complexity of CNNs continue to increase, the “memory wall” problem, which constrains the data exchange between the processing unit and memory unit, impedes their deployments in resource-constrained environments, such as edge computing and the Internet of Things. ReRAM(resistive RAM)-based hardware accelerator has been widely applied in accelerating the computing of matrix-vector multiplication due to its advantages in terms of high density and low power, but are not adept for 32 b floating-point data computation, raising the demand for quantization to reduce the data precision. Manually determining the bitwidth for each layer is time-consuming, therefore, recent studies leverage DDPG(deep deterministic policy gradient) to perform automated quantization on FPGA(field programmable gate array) platform, but it needs to convert continuous actions into discrete actions and the resource constraints are achieved by manually decreasing the bitwidth of each layer. This paper proposes a PPO(proximal policy optimization)-based automated quantization for ReRAM-based hardware accelerator, which uses discrete action space to avoid the action space conversion step.We define a new reward function to enable the PPO agent to automatically learn the optimal quantization policy that meets the resource constraints, and give software-hardware modifications to support mixed-precision computing. Experimental results show that compared with coarse-grained quantization, the proposed method can reduce hardware cost by 20%~30% with negligible loss of accuracy. Compared with other automatic quantification, the proposed method has a shorter search time and can further reduce the hardware cost by about 4.2% under the same resource constraints. This provides insights for co-design of quantization algorithm and hardware accelerator.

Key words: automated quantization, reinforcement learning, ReRAM-based hardware accelerator, neural network, processing in memory

CLC Number: