• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

类脑处理器异步片上网络架构

杨智杰, 王蕾, 石伟, 彭凌辉, 王耀, 徐炜遐

杨智杰, 王蕾, 石伟, 彭凌辉, 王耀, 徐炜遐. 类脑处理器异步片上网络架构[J]. 计算机研究与发展, 2023, 60(1): 17-29. DOI: 10.7544/issn1000-1239.202111032
引用本文: 杨智杰, 王蕾, 石伟, 彭凌辉, 王耀, 徐炜遐. 类脑处理器异步片上网络架构[J]. 计算机研究与发展, 2023, 60(1): 17-29. DOI: 10.7544/issn1000-1239.202111032
Yang Zhijie, Wang Lei, Shi Wei, Peng Linghui, Wang Yao, Xu Weixia. Asynchronous Network-on-Chip Architecture for Neuromorphic Processor[J]. Journal of Computer Research and Development, 2023, 60(1): 17-29. DOI: 10.7544/issn1000-1239.202111032
Citation: Yang Zhijie, Wang Lei, Shi Wei, Peng Linghui, Wang Yao, Xu Weixia. Asynchronous Network-on-Chip Architecture for Neuromorphic Processor[J]. Journal of Computer Research and Development, 2023, 60(1): 17-29. DOI: 10.7544/issn1000-1239.202111032
杨智杰, 王蕾, 石伟, 彭凌辉, 王耀, 徐炜遐. 类脑处理器异步片上网络架构[J]. 计算机研究与发展, 2023, 60(1): 17-29. CSTR: 32373.14.issn1000-1239.202111032
引用本文: 杨智杰, 王蕾, 石伟, 彭凌辉, 王耀, 徐炜遐. 类脑处理器异步片上网络架构[J]. 计算机研究与发展, 2023, 60(1): 17-29. CSTR: 32373.14.issn1000-1239.202111032
Yang Zhijie, Wang Lei, Shi Wei, Peng Linghui, Wang Yao, Xu Weixia. Asynchronous Network-on-Chip Architecture for Neuromorphic Processor[J]. Journal of Computer Research and Development, 2023, 60(1): 17-29. CSTR: 32373.14.issn1000-1239.202111032
Citation: Yang Zhijie, Wang Lei, Shi Wei, Peng Linghui, Wang Yao, Xu Weixia. Asynchronous Network-on-Chip Architecture for Neuromorphic Processor[J]. Journal of Computer Research and Development, 2023, 60(1): 17-29. CSTR: 32373.14.issn1000-1239.202111032

类脑处理器异步片上网络架构

基金项目: 国家重点研发计划项目(2018YFB2202603,2020AAA0104602)
详细信息
    作者简介:

    杨智杰: 1995年生.博士研究生.主要研究方向为类脑计算和计算机架构

    王蕾: 1977年生.博士,副研究员.CCF高级会员.主要研究方向为微处理器设计和类脑计算

    石伟: 1982年生.博士,副研究员.主要研究方向为计算机架构、微处理器设计和信息安全

    彭凌辉: 1997年生.工程师.主要研究方向为类脑计算和片上网络设计

    王耀: 1983年生.博士,助理研究员.主要研究方向为超大规模集成电路、微处理器设计与实现和半导体器件性质与可靠性

    徐炜遐: 1963年生.博士,研究员,博士生导师.CCF会员.主要研究方向为高性能计算机系统结构

    通讯作者:

    王蕾(leiwang@nudt.edu.cn

  • 中图分类号: TP389.1

Asynchronous Network-on-Chip Architecture for Neuromorphic Processor

Funds: This work was supported by the National Key Research and Development Programs of China (2018YFB2202603, 2020AAA0104602).
  • 摘要:

    类脑处理器较深度学习处理器具有能效优势.类脑处理器的片上互连一般采用具有可扩展性高、吞吐量高和通用性高等特点的片上网络.为了解决采用同步片上网络面临的全局时钟树时序难以收敛的问题以及采用异步片上网络面临的链路延迟匹配、缺乏电子设计自动化工具实现和验证的问题,提出了一种异步片上网络架构——NosralC,用于构建全局异步局部同步(global asynchronous local synchronous, GALS)的多核类脑处理器. NosralC采用异步链路和同步路由器实现.实验表明,NosralC较同步基线,在4个类脑应用数据集下展现出37.5%~38.9%的功耗降低、5.5%~8.0%的平均延迟降低和36.7%~47.6%的能效提升,同时增加不多于6%的额外资源以及带来较小的性能开销(吞吐量降低0.8%~2.4%). NosralC在现场可编程门阵列(FPGA)上得到了验证,证明了该架构的可实现性.

    Abstract:

    Neuromorphic processors show extremely high energy efficiency advantages over traditional deep learning processors. The network-on-chip with high scalability, high throughput, and high versatility features is generally adopted as the on-chip communication and connection implementation of neuromorphic processors. In order to solve the problems of making the synchronous network-on-chip that adopts the global clock tree to achieve timing closure, matching link delay in the asynchronous network-on-chip, and lacking electronic design automation tools in implementation and verification of asynchronous network-on-chip, we propose a low-power asynchronous network-on-chip architecture, NosralC, to build a global-asynchronous-local-synchronous multi-core neuromorphic processor. NosralC is implemented with asynchronous links and synchronous routers. The small amount of asynchronous design makes NosralC similar to the synchronous design and friendly to implementation and validation of asynchronous design using existing electronic design automation tools. Experiments show that compared with a synchronous counterpart baseline with the same function, NosralC achieves 37.5%−38.9% reduction in power consumption, 5.5%−8.0% reduction in average latency, and 36.9%−47.6% improvement in energy efficiency in executing the FSDD, DVS128 Gesture, NTI-DIGITS, and NMNIST neuromorphic application datasets while increasing less than 6% additional resource overhead and a small amount of performance overhead (0.8%−2.4% throughput decrease). NosralC is verified on the field programmable gate array (FPGA) platform and its implementability is proved.

  • 图  1   4段握手协议

    Figure  1.   Four-phase handshake protocol

    图  2   2段握手协议

    Figure  2.   Two-phase handshake protocol

    图  3   单轨数据编码

    Figure  3.   Single-rail data coding

    图  4   双轨数据编码

    Figure  4.   Two-rail data coding

    图  5   以时间步为单位的SNN片上执行模式

    Figure  5.   SNN on-chip execution pattern in time step

    图  6   在1000个时间步中被激发的神经元数量

    Figure  6.   Number of fired neurons in 1000 execution time step

    图  7   类脑处理器片上互连通信NoC架构框图

    Figure  7.   Block diagram of NoC architecture for on-chip interconnection and communication of neuromorphic processor

    图  8   5端口双工路由器设计框图

    Figure  8.   Block diagram of 5-ports duplex router design

    图  9   数据报文格式设计

    Figure  9.   Design of data packet format

    图  10   输入模块设计和X-Y路由算法示意图

    Figure  10.   Design of input module and schematic of X-Y routing algorithm

    图  11   输出模块设计和轮询仲裁机制示意图

    Figure  11.   Design of output module and schematic of round robin arbitrating mechanism

    图  12   异步链路的数据通路和控制电路框图

    Figure  12.   Block diagram of data channel and control circuit of asynchronous link

    图  13   同步域与异步域数据转换电路写通道

    Figure  13.   Writing channel of data conversion circuit between synchronous domain and asynchronous domain

    图  14   同步域与异步域数据转换电路读通道

    Figure  14.   Reading channel of data conversion circuit between synchronous domain and asynchronous domain

    图  15   基于GALS技术的NosralC架构设计流程

    Figure  15.   Design flow of GALS based NosralC architecture

    图  16   不同注入率下的吞吐量变化

    Figure  16.   Throughput variation under different injection rates

    表  1   液体状态机SNN配置

    Table  1   Configuration of Liquid State Machine SNN

    参数数量/类型
    输入层神经元数量256
    液体层神经元数量1000
    液体层中的激活型神经元数量800
    液体层中的抑制型神经元数量200
    输出层1000×10全连接层
    激活型→激活型神经元连接概率0.4
    激活型→抑制型神经元连接概率0.4
    抑制型→激活型神经元连接概率0.5
    抑制型→抑制型神经元连接概率0
    下载: 导出CSV

    表  2   同步同等设计基线与NosralC的功耗对比

    Table  2   Power Comparison Between Synchronous Counter part Baseline and NosralC

    测试数据集功耗类型功耗/mW减幅/%
    同步同等设计NosralC
    FSDD静态165.096.741.4
    动态21.419.87.5
    总功耗186.4116.537.5
    NTI-DIGITS静态165.096.741.4
    动态19.716.317.4
    总功耗187.4113.038.8
    DVS128
    Gesture
    静态165.096.641.5
    动态18.817.94.5
    总功耗183.8114.537.7
    NMNIST静态166.096.941.6
    动态24.819.720.6
    总功耗190.8116.638.9
    平均静态164.096.141.4
    动态20.818.212.4
    总功耗184.8114.338.1
    下载: 导出CSV

    表  3   同步同等设计基线与NosralC的平均延迟对比

    Table  3   Average Delay Between Synchronous Counterpart Baseline and NosralC

    测试数据集平均延迟/周期减幅/%
    同步同等设计NosralC
    FSDD109810108.0
    NTI-DIGITS10359528.0
    DVS128 Gesture109710365.5
    NMNIST10629966.2
    平均1072.9998.76.9
    下载: 导出CSV

    表  4   同步同等设计基线与NosralC的吞吐量对比

    Table  4   Throughput Comparison Between Synchronous Counterpart Baseline and NosralC

    测试数据集吞吐量/(报文/周期)减幅/%
    同步同等设计NosralC
    FSDD6.86.71.9
    NTI-DIGITS6.56.42.4
    DVS128 Gesture6.96.81.3
    NMNIST6.76.60.8
    平均6.76.61.6
    下载: 导出CSV

    表  5   同步同等设计基线与NosralC的能效对比

    Table  5   Energy Efficiency Comparison Between Synchronous Counterpart Baseline and NosralC

    测试数据集能效/(Mop/W)增幅/%
    同步同等设计NosralC
    FSDD500.1683.436.7
    NTI-DIGITS493.7714.344.7
    DVS128 Gesture534.4734.737.5
    NMNIST461.2680.847.6
    平均497.4703.341.4
    下载: 导出CSV

    表  6   同步基线与NosralC的资源利用量对比

    Table  6   Resource Utilization Comparison Between Synchronous Counterpart Baseline and NosralC

    FPGA资源资源利用量增幅/%
    同步同等设计NosralC
    LUT5636975957695.7
    FF133262513672282.6
    BUFG110
    平均4.2
    下载: 导出CSV

    表  7   NosralC与先进相关工作的比较

    Table  7   Comparison of NosralC and the State of the Art Work

    配置TrueNorthLoihiNosralC
    工艺65nm ASIC14nm ASICFPGA
    架构2维Mesh2维CMesh2维Mesh
    路由器异步异步同步
    链路异步异步异步
    节点数4096131256
    等效频率/MHz0.001153.8~243.920
    平均延迟/ns113~46536440
    下载: 导出CSV
  • [1]

    Thonnart Y, Vivet P, Clermidy F. A fully-asynchronous low-power framework for GALS NoC integration [C] //Proc of the 13th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2010: 33−38

    [2] Peng Yuanxi, Zhou Feng, Hai Yue, et al. A multi-instruction streams extension mechanism for SIMD processor[J]. Chinese Journal of Electronics, 2017, 26(6): 1154−1160
    [3]

    Fang Jianbin, Liao Xiangke, Huang Chun, et al. Performance evaluation of memory-centric ARMv8 many-core architectures: A case study with Phytium 2000+[J]. Journal of Computer Science and Technology, 2021, 36(1): 33−43 doi: 10.1007/s11390-020-0741-6

    [4]

    Akopyan F, Sawada J, Cassidy A, et al. TrueNorth: Design and tool flow of a 65 mW 1 million neuron programmable neurosynaptic chip[J]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2015, 34(10): 1537−1557 doi: 10.1109/TCAD.2015.2474396

    [5]

    Lines A, Joshi P, Liu Ruokun, et al. Loihi: Asynchronous neuromorphic research chip[C] //Proc of the 24th IEEE Int Symp on Asynchronous Circuits and Systems (ASYNC). Piscataway, NJ: IEEE, 2018: 32−33

    [6]

    Kasapaki E, Schoeberl M, Sørensen R B, et al. Argo: A real-time network-on-chip architecture with an efficient GALS implementation[J]. IEEE Transactions on Very Large Scale Integration Systems, 2016, 24(2): 479−492 doi: 10.1109/TVLSI.2015.2405614

    [7]

    Jiang Weiwei, Bertozzi D, Miorandi G, et al. An asynchronous NoC router in a 14nm FinFET library: Comparison to an industrial synchronous counterpart[C] //Proc of the 20th Design, Automation & Test in Europe Conf & Exhibition (DATE 2017). New York: ACM, 2017: 732−733

    [8]

    Yakovlev A, Vivet P, Renaudin M. Advances in asynchronous logic: From principles to GALS & NoC, recent industry applications, and commercial CAD tools[C] //Proc of the 16th Design, Automation & Test in Europe Conf & Exhibition. Piscataway, NJ: IEEE, 2013: 1715-1724

    [9]

    Wang Bo, Zhou Jun, Wong Wengfai, et al. Shenjing: A low power reconfigurable neuromorphic accelerator with partial-sum and spike networks-on-chip[C] //Proc of the 23rd Design, Automation & Test in Europe Conf & Exhibition (DATE 2020). Piscataway, NJ: IEEE, 2020: 240−245

    [10]

    Frenkel C, Legat J D, Bol D. MorphIC: A 65-nm 738k-synapse/mm 2 quad-core binary-weight digital neuromorphic processor with stochastic spike-driven online learning[J]. IEEE Transactions on Biomedical Circuits and Systems, 2019, 13(5): 999−1010 doi: 10.1109/TBCAS.2019.2928793

    [11]

    Benjamin B V, Peiran G, Mcquinn E, et al. Neurogrid: A mixed-analog-digital multichip system for large-scale neural simulations[J]. Proceedings of the IEEE, 2014, 102(5): 699−716 doi: 10.1109/JPROC.2014.2313565

    [12]

    Moradi S, Qiao Ning, Stefanini F, et al. A scalable multicore architecture with heterogeneous memory structures for dynamic neuromorphic asynchronous processors (DYNAPs)[J]. IEEE Transactions on Biomedical Circuits and Systems, 2018, 12(99): 106−122

    [13] Mundy A, Knight J, Stewart T C, et al. An efficient SpiNNaker implementation of the neural engineering framework[C] //Proc of 2015 Int Joint Conf on Neural Networks (IJCNN). Piscataway, NJ: IEEE, 2015: 692−700
    [14]

    Jackson Z. Free spoken digit dataset[DB/OL]. (2019-12-24) [2021-10-12]. https://github.com/Jakobovski/free-spoken-digit-dataset

    [15] Anumula J, Neil D, Delbruck T, et al. Feature representations for neuromorphic audio spike streams[J]. Frontiers in Neuroscience, 2018, 12: 23−23
    [16]

    Amir A, Taba B, Berg D, et al. A low power, fully event-based gesture recognition system[C] //Proc of the 30th IEEE Conf on Computer Vision & Pattern Recognition. Piscataway, NJ: IEEE, 2017: 7388−7397

    [17] Garrick O, Ajinkya J, Cohen G K, et al. Converting static image datasets to spiking neuromorphic datasets using saccades[J]. Frontiers in Neuroscience, 2015, 9: 473−473
    [18]

    Benini L, Micheli G D. Powering networks on chips[C] //Proc of the 14th Int Symp on System Synthesis. Piscataway, NJ: IEEE, 2001: 33−38

    [19] Leon A S, Langley B, Shin J L. The UltraSPARC T1 processor: CMT reliability[C] //Proc of IEEE Custom Integrated Circuits Conf 2006. Piscataway, NJ: IEEE, 2006: 555−562
    [20]

    Bell S, Edwards B, Amann J, et al. TILE64-processor: A 64-core SoC with mesh interconnect[C] //Proc of the 55th Solid-State Circuits Conf 2008 (ISSCC 2008). Piscataway, NJ: IEEE, 2008: 88−598

    [21]

    Bainbridge J, Furber S. CHAIN: A delay-intensive chip area interconnect[J]. IEEE Micro, 2002, 22(6): 16−23

    [22]

    Fakhri A. QNOC: Quality network operation center[C] //Proc of the 1st Int Conf on Information & Communication Technologies. Piscataway, NJ: IEEE, 2004: 83−84

    [23]

    Bjerreg A R T, Sparso J. A router architecture for connection-oriented service guarantees in the MANGO clockless network-on-chip[C] //Proc of the 8th Design, Automation and Test in Europe 2005. Piscataway, NJ: IEEE, 2005: 1226−1231

    [24]

    Zhang Peng, Lin Chuang, Jiang Yixin et al. ANOC: Anonymous network-coding-based communication with efficient cooperation[J]. IEEE Journal on Selected Areas in Communications, 2012, 30(9): 1738−1745 doi: 10.1109/JSAC.2012.121018

    [25] Orchard G, Lagorce X, Posch C, et al. Live demonstration: Real-time event-driven object recognition on SpiNNaker[C/OL] //Proc of 2015 IEEE Int Symp on Circuits and Systems (ISCAS). 2015 [2021-10-12]. https://ieeexplore.ieee.org/document/7169036
    [26] Dominguez-Morales J P, Jimenez-Fernandez A, Rios-Navarro A, et al. Live demonstration: Multilayer spiking neural network for audio samples classification using SpiNNaker[C/OL] //Proc of 2017 IEEE Int Symp on Circuits and Systems (ISCAS). 2017 [2021-10-12]. https://ieeexplore.ieee.org/document/8050404/media#media
    [27] Gutierrez-Galan D, Dominguez-Morales J P, Perez-Pena F, et al. Live demonstration: Neuromorphic robotics, from audio to locomotion through spiking CPG on SpiNNaker[C/OL] //Proc of 2019 IEEE Int Symp on Circuits and Systems (ISCAS). 2019 [2021-10-12].https://ieeexplore.ieee.org/document/8702186
    [28] 渠鹏,陈嘉杰,张悠慧,等. 实现软硬件解耦合的类脑计算硬件设计方法[J]. 计算机研究与发展,2021,58(6):1146−1154 doi: 10.7544/issn1000-1239.2021.20210170

    Qu Peng, Chen Jiajie, Zhang Youhui, et al. A proposal of software-hardware decoupling hardware design method for brain-inspired computing[J]. Journal of Computer Research and Development, 2021, 58(6): 1146−1154 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210170

    [29]

    Maass W, Natschl G T, Markram H. Real-time computing without stable states: A new framework for neural computation based on perturbations[J]. Neural Computation, 2002, 14(11): 2531−2560 doi: 10.1162/089976602760407955

    [30]

    Peeters A, Beest F T, Wit M D, et al. Click elements: An implementation style for data-driven compilation [C] //Proc of the 16th IEEE Symp on Asynchronous Circuits and Systems. Piscataway, NJ: IEEE, 2010: 3−14

    [31]

    Singh M, Nowick S M. MOUSETRAP: High-speed transition-signaling asynchronous pipelines[J]. IEEE Transactions on Very Large Scale Integration Systems, 2007, 15(6): 684−698 doi: 10.1109/TVLSI.2007.898732

  • 期刊类型引用(2)

    1. 刘阳,鲁圆圆,郭成城. 基于优先级的数据中心任务优化调度算法设计. 计算机仿真. 2025(01): 497-500+507 . 百度学术
    2. 骆海霞. 基于递推估计的Web前端偶发任务能耗感知方法. 黑龙江工业学院学报(综合版). 2023(10): 115-120 . 百度学术

    其他类型引用(1)

图(16)  /  表(7)
计量
  • 文章访问数:  320
  • HTML全文浏览量:  37
  • PDF下载量:  192
  • 被引次数: 3
出版历程
  • 收稿日期:  2021-10-17
  • 修回日期:  2022-06-06
  • 网络出版日期:  2023-02-10
  • 刊出日期:  2022-12-31

目录

    /

    返回文章
    返回