PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator

Wei Zheng; Zhang Xingjun; Zhuo Zhimin; Ji Zeyu; Li Yonghao

doi:10.7544/issn1000-1239.20210551

Journal of Computer Research and Development > 2022 > 59(3): 518-532. > DOI: 10.7544/issn1000-1239.20210551 CSTR: 32373.14.issn1000-1239.20210551

Wei Zheng, Zhang Xingjun, Zhuo Zhimin, Ji Zeyu, Li Yonghao. PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator[J]. Journal of Computer Research and Development, 2022, 59(3): 518-532. DOI: 10.7544/issn1000-1239.20210551

Citation:

PDF (5748 KB)

PPO-Based Automated Quantization for ReRAM-Based Hardware Accelerator

¹(School of Computer Science and Technology, Xi’an Jiaotong University, Xi’an 710049)
²(Beijing Institute of Electronic System Engineering, Beijing 100854)

Funds: This work was supported by the National Key Research and Development Program of China (2016YFB0200902).

More Information

Published Date: February 28, 2022

Graphical Abstract

Abstract

Abstract

Convolutional neural networks have already exceeded the capabilities of humans in many fields. However, as the memory consumption and computational complexity of CNNs continue to increase, the “memory wall” problem, which constrains the data exchange between the processing unit and memory unit, impedes their deployments in resource-constrained environments, such as edge computing and the Internet of Things. ReRAM(resistive RAM)-based hardware accelerator has been widely applied in accelerating the computing of matrix-vector multiplication due to its advantages in terms of high density and low power, but are not adept for 32 b floating-point data computation, raising the demand for quantization to reduce the data precision. Manually determining the bitwidth for each layer is time-consuming, therefore, recent studies leverage DDPG(deep deterministic policy gradient) to perform automated quantization on FPGA(field programmable gate array) platform, but it needs to convert continuous actions into discrete actions and the resource constraints are achieved by manually decreasing the bitwidth of each layer. This paper proposes a PPO(proximal policy optimization)-based automated quantization for ReRAM-based hardware accelerator, which uses discrete action space to avoid the action space conversion step.We define a new reward function to enable the PPO agent to automatically learn the optimal quantization policy that meets the resource constraints, and give software-hardware modifications to support mixed-precision computing. Experimental results show that compared with coarse-grained quantization, the proposed method can reduce hardware cost by 20%~30% with negligible loss of accuracy. Compared with other automatic quantification, the proposed method has a shorter search time and can further reduce the hardware cost by about 4.2% under the same resource constraints. This provides insights for co-design of quantization algorithm and hardware accelerator.
- automated quantization,
- reinforcement learning,
- ReRAM-based hardware accelerator,
- neural network,
- processing in memory

FullText(HTML)

References (0)

[1]	Liu He, Ji Yu, Han Jianhui, Zhang Youhui, Zheng Weimin. Training and Software Simulation for ReRAM-Based LSTM Neural Network Acceleration[J]. Journal of Computer Research and Development, 2019, 56(6): 1182-1191. DOI: 10.7544/issn1000-1239.2019.20190113
[2]	Fang Rongqiang, Wang Jing, Yao Zhicheng, Liu Chang, Zhang Weigong. Modeling Computational Feature of Multi-Layer Neural Network[J]. Journal of Computer Research and Development, 2019, 56(6): 1170-1181. DOI: 10.7544/issn1000-1239.2019.20190111
[3]	Mao Haiyu, Shu Jiwu. 3D Memristor Array Based Neural Network Processing in Memory Architecture[J]. Journal of Computer Research and Development, 2019, 56(6): 1149-1160. DOI: 10.7544/issn1000-1239.2019.20190099
[4]	Chen Guilin, Ma Sheng, Guo Yang. Survey on Accelerating Neural Network with Hardware[J]. Journal of Computer Research and Development, 2019, 56(2): 240-253. DOI: 10.7544/issn1000-1239.2019.20170852
[5]	Wang Chenxi, Lü Fang, Cui Huimin, Cao Ting, John Zigman, Zhuang Liangji, Feng Xiaobing. Heterogeneous Memory Programming Framework Based on Spark for Big Data Processing[J]. Journal of Computer Research and Development, 2018, 55(2): 246-264. DOI: 10.7544/issn1000-1239.2018.20170687
[6]	Li Chuxi, Fan Xiaoya, Zhao Changhe, Zhang Shengbing, Wang Danghui, An Jianfeng, Zhang Meng. A Memristor-Based Processing-in-Memory Architecture for Deep Convolutional Neural Networks Approximate Computation[J]. Journal of Computer Research and Development, 2017, 54(6): 1367-1380. DOI: 10.7544/issn1000-1239.2017.20170099
[7]	Bian Chen, Yu Jiong, Xiu Weirong, Qian Yurong, Ying Changtian, Liao Bin. Partial Data Shuffled First Strategy for In-Memory Computing Framework[J]. Journal of Computer Research and Development, 2017, 54(4): 787-803. DOI: 10.7544/issn1000-1239.2017.20160049
[8]	Liu Zhibin, Zeng Xiaoqin, Liu Huiyi, Chu Rong. A Heuristic Two-layer Reinforcement Learning Algorithm Based on BP Neural Networks[J]. Journal of Computer Research and Development, 2015, 52(3): 579-587. DOI: 10.7544/issn1000-1239.2015.20131270
[9]	Li Ning, Xie Zhenhua, Xie Junyuan, and Chen Shifu. SEFNN—A Feed-Forward Neural Network Design Algorithm Based on Structure Evolution[J]. Journal of Computer Research and Development, 2006, 43(10): 1713-1718.
[10]	Li Kai, Huang Houkuan. A Selective Approach to Neural Network Ensemble Based on Clustering Technology[J]. Journal of Computer Research and Development, 2005, 42(4): 594-598.

Cited By

Cited by

Periodical cited type(2)

1.	赵安宁，许诺，刘康，罗莉，潘炳征，薄子怡，谭承浩. 面向低磨损存内计算的多状态逻辑门综合. 计算机研究与发展. 2025(03): 620-632 . 本站查看
2.	危华明，廖剑平. 海量数据存储中云服务器性能加速方法仿真. 计算机仿真. 2023(05): 515-519 .