ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (6): 1182-1191.doi: 10.7544/issn1000-1239.2019.20190113

所属专题: 2019面向人工智能的计算机体系结构专题

• 系统结构 • 上一篇    下一篇



  1. 1(清华大学计算机科学与技术系 北京 100084);2(清华大学微电子学研究所 北京 100084) (
  • 出版日期: 2019-06-01
  • 基金资助: 

Training and Software Simulation for ReRAM-Based LSTM Neural Network Acceleration

Liu He1, Ji Yu1, Han Jianhui2, Zhang Youhui1, Zheng Weimin1   

  1. 1(Deparment of Computer Science and Technology, Tsinghua University, Beijing 100084);2(Institute of Microelectronics, Tsinghua University, Beijing 100084)
  • Online: 2019-06-01
  • Supported by: 
    This work was supported by the Science and Technology Innovation Special Zone Project.

摘要: 长短期记忆(long short-term memory, LSTM)网络是一种循环神经网络,其擅长处理和预测时间序列中间隔和延迟较长的事件,多用于语音识别、机器翻译等领域.然而受限于内存带宽的限制,现今的多数神经网络加速器件的计算模式并不能高效处理长短期记忆网络计算;而阻变存储器交叉开关结构能够以存内计算形式完成高效、高密度的向量矩阵乘运算,从而成为一种高效处理长短期记忆网络的极具潜力的加速器设计模式.研究了面向阻变存储器的长短期记忆神经网络加速器模拟工具以及相应的神经网络训练算法.该模拟工具能够以时钟驱动的形式模拟设计者提出的以阻变存储器交叉开关结构为核心加速部件的长短期记忆加速器微体系结构,从而进行设计空间探索;同时改进了神经网络训练算法以适应阻变存储器特性.这一模拟工具基于System-C实现,且对于核心计算部分实现了图形处理器加速,可以提高阻变存储器器件的仿真速度,为探索设计空间提供便利.

关键词: 阻变存储器, 长短期记忆网络, 训练算法, 仿真框架, 神经网络

Abstract: Long short-term memory (LSTM) is mostly used in fields of speech recognition, machine translation, etc., owing to its expertise in processing and predicting events with long intervals and long delays in time series. However, most of existing neural network acceleration chips cannot perform LSTM computation efficiently, as limited by the low memory bandwidth. ReRAM-based crossbars, on the other hand, can process matrix-vector multiplication efficiently due to its characteristic of processing in memory (PIM). However, a software tool of broad architectural exploration and end-to-end evaluation for ReRAM-based LSTM acceleration is still missing. This paper proposes a simulator for ReRAM-based LSTM neural network acceleration and a corresponding training algorithm. Main features (including imperfections) of ReRAM devices and circuits are reflected by the highly configurable tools, and the core computation of simulation can be accelerated by general-purpose graphics processing unit (GPGPU). Moreover, the core component of simulator has been verified by the corresponding circuit simulation of a real chip design. Within this framework, architectural exploration and comprehensive end-to-end evaluation can be achieved.

Key words: ReRAM, long short-term memory (LSTM), training algorithm, simulation framework, neural network