PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator

Wei Zheng; Zhang Xingjun; Zhuo Zhimin; Ji Zeyu; Li Yonghao

doi:10.7544/issn1000-1239.20210551

Wei Zheng, Zhang Xingjun, Zhuo Zhimin, Ji Zeyu, Li Yonghao. PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator[J]. Journal of Computer Research and Development, 2022, 59(3): 518-532. DOI: 10.7544/issn1000-1239.20210551

Citation:

PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator

Graphical Abstract

Abstract

Abstract

Convolutional neural networks have already exceeded the capabilities of humans in many fields. However, as the memory consumption and computational complexity of CNNs continue to increase, the “memory wall” problem, which constrains the data exchange between the processing unit and memory unit, impedes their deployments in resource-constrained environments, such as edge computing and the Internet of Things. ReRAM(resistive RAM)-based hardware accelerator has been widely applied in accelerating the computing of matrix-vector multiplication due to its advantages in terms of high density and low power, but are not adept for 32 b floating-point data computation, raising the demand for quantization to reduce the data precision. Manually determining the bitwidth for each layer is time-consuming, therefore, recent studies leverage DDPG(deep deterministic policy gradient) to perform automated quantization on FPGA(field programmable gate array) platform, but it needs to convert continuous actions into discrete actions and the resource constraints are achieved by manually decreasing the bitwidth of each layer. This paper proposes a PPO(proximal policy optimization)-based automated quantization for ReRAM-based hardware accelerator, which uses discrete action space to avoid the action space conversion step.We define a new reward function to enable the PPO agent to automatically learn the optimal quantization policy that meets the resource constraints, and give software-hardware modifications to support mixed-precision computing. Experimental results show that compared with coarse-grained quantization, the proposed method can reduce hardware cost by 20%~30% with negligible loss of accuracy. Compared with other automatic quantification, the proposed method has a shorter search time and can further reduce the hardware cost by about 4.2% under the same resource constraints. This provides insights for co-design of quantization algorithm and hardware accelerator.

FullText(HTML)

References (0)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator

Abstract

Catalog

Export File

Citation

Format

Content