ISSN 1000-1239 CN 11-1777/TP


    Default Latest Most Read
    Please wait a minute...
    For Selected: Toggle Thumbnails
    Journal of Computer Research and Development    2019, 56 (6): 1133-1134.  
    Abstract1250)   HTML291)    PDF (204KB)(828)       Save
    Related Articles | Metrics
    Brain-like Machine: Thought and Architecture
    Huang Tiejun, Yu Zhaofei, Liu Yijun
    Journal of Computer Research and Development    2019, 56 (6): 1135-1148.   DOI: 10.7544/issn1000-1239.2019.20190240
    Abstract1711)   HTML68)    PDF (3343KB)(1481)       Save
    The theoretical limitation of the classical computing machinery, including all the computers with von Neumann architecture, was defined by Alan Turing in 1936. Owing to lack of the hardware neuromorphic devices, neural networks have been implemented with computers to realize artificial intelligence for decades. However, the von Neumann architecture doesn’t match with the asynchronous parallel structure and communication mechanism of the neural networks, with consequences such as huge power consumption. To develop the neural network oriented architecture for artificial intelligence and common information processing is an important direction for architecture research. Brain-like machine is an intelligent machine which is constructed with neuromorphic devices according to the structure of biological neural network, and is better on spatio-temporal information processing than classic computer. The idea of brain-like machine had been proposed before the invention of computer. The research and development practice has been carried out for more than three decades. As one of the several brain-like systems being in operation, SpiNNaker focuses on the research on the architecture of brain-like systems with an effective brain-like scheme. In the next 20 years or so, it is expected that the detailed analysis of model animal brain and human brain will be completed step by step, and the neuromorphic devices and integrated processes will be gradually mature, and the brain-like machine with structure close to the brain and performance far beyond the brain is expected to be realized. As a kind of spiking neural networks, and with neuromorphic devices which behavior is true random, the brain-like machine can emerge abundant nonlinear dynamic behaviors. It had been proven that any Turing machine can be constructed with spiking neural network. Whether the brain-like machine can transcend the theoretical limitation of the Turing machine? This is a big open problem to break through.
    Related Articles | Metrics
    3D Memristor Array Based Neural Network Processing in Memory Architecture
    Mao Haiyu, Shu Jiwu
    Journal of Computer Research and Development    2019, 56 (6): 1149-1160.   DOI: 10.7544/issn1000-1239.2019.20190099
    Abstract1194)   HTML17)    PDF (2125KB)(695)       Save
    Nowadays, due to the rapid development of artificial intelligence, the memristor-based processing in memory (PIM) architecture for neural network (NN) attracts a lot of researchers’ interests since it performs much better than traditional von Neumann architecture. Equipped with the peripheral circuit to support function units, memristor arrays can process a forward propagation with higher parallelism and much less data movement than that in CPU and GPU. However, the hardware of the memristor-based PIM suffers from the large area overhead of peripheral circuit outside the memristor array and non-trivial under-utilization of function units. This paper proposes a 3D memristor array based PIM architecture for NNs (FMC) by gathering the peripheral circuit of function units into a function pool for sharing among memristor arrays that pile up on the pool. We also propose a data mapping scheme for the 3D memristor array based PIM architecture to further increase the utilization of function units and reduce the data transmission among different cubes. The software-hardware co-design for the 3D memristor array based PIM not only makes the most of function units but also shortens the wire interconnections for better high-performance and energy-efficient data transmission. Experiments show that when training a single neural network, our proposed FMC can achieve up to 43.33 times utilization of the function units and can achieve up to 58.51 times utilization of the function units when training multiple neural networks. At the same time, compared with the 2D-PIM which has the same amount of compute array and storage array, FMC only occupies 42.89% area of 2D-PIM. What’s more, FMC has 1.5 times speedup and 1.7 times energy saving compared with 2D-PIM.
    Related Articles | Metrics
    A Secure Encryption Scheme for Deep Learning Accelerators
    Zuo Pengfei, Hua Yu, Xie Xinfeng, Hu Xing, Xie Yuan, Feng Dan
    Journal of Computer Research and Development    2019, 56 (6): 1161-1169.   DOI: 10.7544/issn1000-1239.2019.20190109
    Abstract1300)   HTML28)    PDF (1368KB)(736)       Save
    With the rapid development of machine learning techniques, especially deep learning (DL), their application domains are wider and wider and increasingly expanded from cloud computing to edge computing. In deep learning, DL models as the intellectual property (IP) of model providers become important data. We observe that DL accelerators deployed on edge devices for edge computing have the risk of leaking DL models stored on them. Attackers are able to easily obtain the DL model data by snooping the memory bus connecting the on-chip accelerator and off-chip device memory. Therefore, encrypting data transmitted on the memory bus is non-trivial. However, directly using memory encryption in DL accelerators significantly decreases their performance. To address this problem, this paper proposes COSA, a COunter mode Secure deep learning Accelerator architecture. COSA achieves higher security level than direct encryption and removes decryption operations from the critical path of memory accesses by leveraging counter mode encryption. We have implemented COSA in GPGPU-Sim and evaluated it using the neural network workload. Experimental results show COSA improves the performance of the secure accelerator by over 3 times compared with direct encryption and causes only 13% performance decrease compared with an insecure accelerator without using encryption.
    Related Articles | Metrics
    Modeling Computational Feature of Multi-Layer Neural Network
    Fang Rongqiang, Wang Jing, Yao Zhicheng, Liu Chang, Zhang Weigong
    Journal of Computer Research and Development    2019, 56 (6): 1170-1181.   DOI: 10.7544/issn1000-1239.2019.20190111
    Abstract1115)   HTML23)    PDF (2584KB)(439)       Save
    Deep neural networks (DNNs) have become increasingly popular as machine learning technique in applications, due to their ability to achieve high accuracy for tasks such as speech/image recognition. However, with the rapid growth on the scale of data and precision of recognition, the topology of neural network is becoming more and more complicated. Thus, how to design the energy-efficiency and programmability, neural or deep learning accelerator plays an essential role in next generation computer. In this paper, we propose a layer granularity analysis method, which could extract computation operations and memory requirement features through general expression and basic operation attributions. We also propose a max value replacement schedule strategy, which schedules the computation hardware resource based on the network feature we extract. Evaluation results show our method can increase computational efficiency and lead to a higher resource utilization.
    Related Articles | Metrics
    Training and Software Simulation for ReRAM-Based LSTM Neural Network Acceleration
    Liu He, Ji Yu, Han Jianhui, Zhang Youhui, Zheng Weimin
    Journal of Computer Research and Development    2019, 56 (6): 1182-1191.   DOI: 10.7544/issn1000-1239.2019.20190113
    Abstract977)   HTML9)    PDF (1075KB)(834)       Save
    Long short-term memory (LSTM) is mostly used in fields of speech recognition, machine translation, etc., owing to its expertise in processing and predicting events with long intervals and long delays in time series. However, most of existing neural network acceleration chips cannot perform LSTM computation efficiently, as limited by the low memory bandwidth. ReRAM-based crossbars, on the other hand, can process matrix-vector multiplication efficiently due to its characteristic of processing in memory (PIM). However, a software tool of broad architectural exploration and end-to-end evaluation for ReRAM-based LSTM acceleration is still missing. This paper proposes a simulator for ReRAM-based LSTM neural network acceleration and a corresponding training algorithm. Main features (including imperfections) of ReRAM devices and circuits are reflected by the highly configurable tools, and the core computation of simulation can be accelerated by general-purpose graphics processing unit (GPGPU). Moreover, the core component of simulator has been verified by the corresponding circuit simulation of a real chip design. Within this framework, architectural exploration and comprehensive end-to-end evaluation can be achieved.
    Related Articles | Metrics
    Accelerating Fully Connected Layers of Sparse Neural Networks with Fine-Grained Dataflow Architectures
    Xiang Taoran, Ye Xiaochun, Li Wenming, Feng Yujing, Tan Xu,Zhang Hao, Fan Dongrui
    Journal of Computer Research and Development    2019, 56 (6): 1192-1204.   DOI: 10.7544/issn1000-1239.2019.20190117
    Abstract1337)   HTML20)    PDF (2435KB)(727)       Save
    Deep neural network (DNN) is a hot and state-of-the-art algorithm which is widely used in applications such as face recognition, intelligent monitoring, image recognition and text recognition. Because of its high computational complexity, many efficient hardware accelerators have been proposed to exploit high degree of parallel processing for DNN. However, the fully connected layers in DNN have a large number of weight parameters, which imposes high requirements on the bandwidth of the accelerator. In order to reduce the bandwidth pressure of the accelerator, some DNN compression algorithms are proposed. But accelerators which are implemented on FPGAs and ASICs usually sacrifice generality for higher performance and lower power consumption, making it difficult to accelerate sparse neural networks. Other accelerators, such as GPUs, are general enough, but they lead to higher power consumption. Fine-grained dataflow architectures, which break conventional Von Neumann architectures, show natural advantages in processing DNN-like algorithms with high computational efficiency and low power consumption. At the same time, it remains broadly applicable and adaptable. In this paper, we propose a scheme to accelerate the sparse DNN fully connected layers on a hardware accelerator based on fine-grained dataflow architecture. Compared with the original dense fully connected layers, the scheme reduces the peak bandwidth requirement of 2.44×~ 6.17×. In addition, the utilization of the computational resource of the fine-grained dataflow accelerator running the sparse fully-connected layers far exceeds the implementation by other hardware platforms, which is 43.15%, 34.57%, and 44.24% higher than the CPU, GPU, and mGPU, respectively.
    Related Articles | Metrics