高级检索
    李楚曦, 樊晓桠, 赵昌和, 张盛兵, 王党辉, 安建峰, 张萌. 基于忆阻器的PIM结构实现深度卷积神经网络近似计算[J]. 计算机研究与发展, 2017, 54(6): 1367-1380. DOI: 10.7544/issn1000-1239.2017.20170099
    引用本文: 李楚曦, 樊晓桠, 赵昌和, 张盛兵, 王党辉, 安建峰, 张萌. 基于忆阻器的PIM结构实现深度卷积神经网络近似计算[J]. 计算机研究与发展, 2017, 54(6): 1367-1380. DOI: 10.7544/issn1000-1239.2017.20170099
    Li Chuxi, Fan Xiaoya, Zhao Changhe, Zhang Shengbing, Wang Danghui, An Jianfeng, Zhang Meng. A Memristor-Based Processing-in-Memory Architecture for Deep Convolutional Neural Networks Approximate Computation[J]. Journal of Computer Research and Development, 2017, 54(6): 1367-1380. DOI: 10.7544/issn1000-1239.2017.20170099
    Citation: Li Chuxi, Fan Xiaoya, Zhao Changhe, Zhang Shengbing, Wang Danghui, An Jianfeng, Zhang Meng. A Memristor-Based Processing-in-Memory Architecture for Deep Convolutional Neural Networks Approximate Computation[J]. Journal of Computer Research and Development, 2017, 54(6): 1367-1380. DOI: 10.7544/issn1000-1239.2017.20170099

    基于忆阻器的PIM结构实现深度卷积神经网络近似计算

    A Memristor-Based Processing-in-Memory Architecture for Deep Convolutional Neural Networks Approximate Computation

    • 摘要: 忆阻器(memristor)能够将存储和计算的特性融合,可用于构建存储计算一体化的PIM(processing-in-memory)结构.但是,由于计算阵列以及结构映射方法的限制,基于忆阻器阵列的深度神经网络计算需要频繁的AD/DA转换以及大量的中间存储,导致了显著的能量和面积开销.提出了一种新型的基于忆阻器的深度卷积神经网络近似计算PIM结构,利用模拟忆阻器大大增加数据密度,并将卷积过程分解到不同形式的忆阻器阵列中分别计算,增加了数据并行性,减少了数据转换次数并消除了中间存储,从而实现了加速和节能.针对该结构中可能存在的精度损失,给出了相应的优化策略.对不同规模和深度的神经网络计算进行仿真实验评估,结果表明,在相同计算精度下,该结构可以最多降低90%以上的能耗,同时计算性能提升约90%.

       

      Abstract: Memristor is one of the most promising candidates to build processing-in-memory (PIM) structures. The memristor-based PIM with digital or multi-level memristors has been proposed for neuromorphic computing. The essential frequent AD/DA converting and intermediate memory in these structures leads to significant energy and area overhead. To address this issue, a memristor-based PIM architecture for deep convolutional neural network (CNN) is proposed in this work. It exploits the analog architecture to eliminate data converting in neuron layer banks, each of which consists of two special modules named weight sub-arrays (WSAs) and accumulate sub-arrays (ASAs). The partial sums of neuron inputs are generated in WSAs concurrently and are written into ASAs continuously, in which the results are computed finally. The noise in proposed analog style architecture is analyzed quantitatively in both model and circuit levels, and a synthetic solution is presented to suppress the noise, which calibrates the non-linear distortion of weight with a corrective function, pre-charges the write module to reduce the parasitic effects, and eliminates noise with a modified noise-aware training. The proposed design has been evaluated by varying neural network benchmarks, in which the results show that the energy efficiency and performance can both be improved about 90% in specific neural network without accuracy losses compared with digital solutions.

       

    /

    返回文章
    返回