ISSN 1000-1239 CN 11-1777/TP

Highlights

    Please wait a minute...
    For Selected: Toggle Thumbnails
    Review of Entity Relation Extraction Methods
    Li Dongmei, Zhang Yang, Li Dongyuan, Lin Danqiong
    Journal of Computer Research and Development    2020, 57 (7): 1424-1448.   DOI: 10.7544/issn1000-1239.2020.20190358
    Abstract470)      PDF (1404KB)(406)       Save
    There is a phenomenon that information extraction has long been concerned by a lot of research works in the field of natural language processing. Information extraction mainly includes three sub-tasks: entity extraction, relation extraction and event extraction, among which relation extraction is the core mission and a great significant part of information extraction. Furthermore, the main goal of entity relation extraction is to identify and determine the specific relation between entity pairs from plenty of natural language texts, which provides fundamental support for intelligent retrieval, semantic analysis, etc, and improves both search efficiency and the automatic construction of the knowledge base. Then, we briefly expound the development of entity relation extraction and introduce several tools and evaluation systems of relation extraction in both Chinese and English. In addition, four main methods of entity relation extraction are mentioned in this paper, including traditional relation extraction methods, and other three methods respectively based on traditional machine learning, deep learning and open domain. What is more important is that we summarize the mainstream research methods and corresponding representative results in different historical stages, and conduct contrastive analysis concerning different entity relation extraction methods. In the end, we forecast the contents and trend of future research.
    Related Articles | Metrics
    An Energy Consumption Optimization and Evaluation for Hybrid Cache Based on Reinforcement Learning
    Fan Hao, Xu Guangping, Xue Yanbing, Gao Zan, Zhang Hua
    Journal of Computer Research and Development    2020, 57 (6): 1125-1139.   DOI: 10.7544/issn1000-1239.2020.20200010
    Abstract395)   HTML1)    PDF (3887KB)(312)       Save
    Emerging non-volatile memory STT-RAM has the characteristics of low leakage power, high density, fast read speed, and high write energy. Meanwhile, SRAM has the characteristics of high leakage power, low density, fast read and write speed, low write energy, etc. The hybrid cache of SRAM and STT-RAM fully utilizes the respective advantages of both memory medias, providing lower leakage power and higher cell density than SRAM, higher write speed and lower write energy than STT-RAM. The architecture of hybrid cache mainly achieves both of benefits by putting write-intensive data into SRAM and read-intensive data into STT-RAM. Therefore, how to identify and allocate read-write-intensive data is the key challenge for the hybrid cache design. This paper proposes a cache management method based on the reinforcement learning that uses the write intensity and reuse information of cache access requests to design a cache allocation policy and optimize energy consumption. The key idea is to use the reinforcement learning algorithm to get the weight for the set allocating to SRAM or STT-RAM by learning from the energy consumption of cache line sets. The algorithm allocates a cache line in a set to the region with greater weight. Evaluations show that our proposed policy reduces the average energy consumption by 16.9%(9.7%) in a single-core (quad-core) system compared with the previous policies.
    Related Articles | Metrics
    Optimizing Winograd-Based Fast Convolution Algorithm on Phytium Multi-Core CPUs
    Wang Qinglin, Li Dongsheng, Mei Songzhu, Lai Zhiquan, Dou Yong
    Journal of Computer Research and Development    2020, 57 (6): 1140-1151.   DOI: 10.7544/issn1000-1239.2020.20200107
    Abstract232)   HTML0)    PDF (2411KB)(165)       Save
    Convolutional neural networks (CNNs) have been extensively used in artificial intelligence fields such as computer vision and natural language processing. Winograd-based fast convolution algorithms can effectively reduce the computational complexity of convolution operations in CNNs so that they have attracted great attention. With the application of Phytium multi-core CPUs independently developed by the National University of Defense Technology in artificial intelligence fields, there is strong demand of high-performance convolution primitives for Phytium multi-core CPUs. This paper proposes a new high-performance parallel Winograd-based fast convolution algorithm after studying architecture characteristics of Phytium multi-core CPUs and computing characteristics of Winograd-based fast convolution algorithms. The new parallel algorithm does not rely on general matrix multiplication routines, and consists of four stages: kernels transformation, input feature maps transformation, element-wise multiplication, and output feature maps inverse transformation. The data movements in all four stages have been collaboratively optimized to improve memory access performance of the algorithm. The custom data layouts, multi-level parallel data transformation algorithms and multi-level parallel matrix multiplication algorithm have also been proposed to support the optimization above efficiently. The algorithm is tested on two Phytium multi-core CPUs. Compared with Winograd-based fast convolution implementations in ARM Computer Library (ACL) and NNPACK, the algorithm can achieve speedup of 1.05~16.11 times and 1.66~16.90 times, respectively. The application of the algorithm in the open source framework Mxnet improves the forward-propagation performance of the VGG16 network by 3.01~6.79 times.
    Related Articles | Metrics
    Efficient Optimization of Graph Computing on High-Throughput Computer
    Zhang Chenglong, Cao Huawei, Wang Guobo, Hao Qinfen, Zhang Yang, Ye Xiaochun, Fan Dongrui
    Journal of Computer Research and Development    2020, 57 (6): 1152-1163.   DOI: 10.7544/issn1000-1239.2020.20200115
    Abstract187)   HTML0)    PDF (1876KB)(141)       Save
    With the rapid development of computing technology, the scale of graph increases explosively and large-scale graph computing has been the focus in recent years. Breadth first search (BFS) is a classic algorithm to solve graph traverse problem. It is the main kernel of Graph500 benchmark that evaluates the performance of supercomputers and servers in terms of data-intensive applications. High-throughput computer (HTC) adopts ARM-based many-core architecture, which has the characteristics of high concurrency, strong real-time, low-power consumption. The optimization of BFS algorithm has made a series of progress on single-node systems. In this paper, we first introduce parallel BFS algorithm and existing optimizations. Then we propose two optimization techniques for HTC to improve the efficiency of data access and data locality. We systematically evaluate the performance of BFS algorithm on HTC. For the Kronecker graph with 2scale=230whose vertices are 230 and edges are 234, the average performance on HTC is 24.26 GTEPS and 1.18 times faster than the two-way x86 server. In terms of energy efficiency, the result on HTC is 181.04 MTEPS/W and rank 2nd place on the June 2019 Green Graph500 big data list. To our best knowledge, this is the first work that evaluates BFS performance on HTC platform. HTC is suitable for data intensive applications such as large-scale graph computing.
    Related Articles | Metrics
    Programming and Developing Environment for FPGA Graph Processing: Survey and Exploration
    Guo Jinyang, Shao Chuanming, Wang Jing, Li Chao, Zhu Haojin, Guo Minyi
    Journal of Computer Research and Development    2020, 57 (6): 1164-1178.   DOI: 10.7544/issn1000-1239.2020.20200106
    Abstract535)   HTML0)    PDF (2346KB)(157)       Save
    Due to the advantages of high performance and efficiency, graph processing accelerators based on reconfigurable architecture field programmable gate array (FPGA) have attracted much attention, which satisfy complex graph applications with various basic operations and large-scale of graph data. However, efficient code design for FPGA takes long time, while the existing functional programming environment cannot achieve desirable performance. Thus, the problem of programming wall on FPGA is significant, and has become a serious obstacle when designing the dedicated accelerators. A well-designed programming environment is necessary for the further popularity of FPGA-based graph processing accelerators. A well-designed programming environment calls for convenient application programming interfaces, scalable application programming models, efficient high-level synthesis tools, and a domain-specific language that can integrate software/hardware features and generate high-performance underlying code. In this article, we make a systematic exploration of the programming environment for FPGA graph processing. We mainly introduce and analyze programming models, high-level synthesis, programming languages, and the related hardware frameworks. In addition, we also introduce the domestic and foreign development of FPGA-based graph processing accelerators. Finally, we discuss the open issues and challenges in this specific area.
    Related Articles | Metrics
    A Cross-Layer Memory Tracing Toolkit for Big Data Application Based on Spark
    Xu Danya, Wang Jing, Wang Li, Zhang Weigong
    Journal of Computer Research and Development    2020, 57 (6): 1179-1190.   DOI: 10.7544/issn1000-1239.2020.20200109
    Abstract181)   HTML0)    PDF (2108KB)(110)       Save
    Spark has been increasingly employed by industries for big data analytics recently, due to its efficient in-memory distributed programming model. Most existing optimization and analysis tool of Spark perform at either application layer or operating system layer separately, which makes Spark semantics separate from the underlying actions. For example, unknowing the impaction of operating system parameters on performance of Spark layer will lead unknowing of how to use OS parameters to tune system performance. In this paper, we propose SMTT, a new Spark memory tracing toolkit, which establishes the semantics of the upper application and the underlying physical hardware across Spark layer, JVM layer and OS layer. Based on the characteristics of Spark memory, we design the tracking scheme of execution memory and storage memory respectively. Then we analyze the Spark iterative calculation process and execution/storage memory usage by SMTT. The experiment of RDD memory assessment analysis shows our toolkit could be effectively used on performance analysis and provide guides for optimization of Spark memory system.
    Related Articles | Metrics
    Performance Optimization of Cache Subsystem in General Purpose Graphics Processing Units: A Survey
    Zhang Jun, Xie Jingcheng, Shen Fanfan, Tan Hai, Wang Lümeng, He Yanxiang
    Journal of Computer Research and Development    2020, 57 (6): 1191-1207.   DOI: 10.7544/issn1000-1239.2020.20200113
    Abstract135)   HTML1)    PDF (1220KB)(146)       Save
    With the development of process technology and the improvement of architecture, the parallel computing performance of GPGPU(general purpose graphics processing units) is updated a lot, which makes GPGPU applied more and more widely in the fields of high performance and high throughput. GPGPU can obtain high parallel computing performance, as it can hide the long latency incurred by the memory accesses via supporting thousands of concurrent threads. Due to the existance of irregular computation and memory access in some applications, the performance of the memory subsystem is affected a lot, especially the contention of the on-chip cache can become serious, and the performance of GPGPU can not be up to the maximum. Alleviating the contention and optimizing the performance of the on-chip cache have become one of the main solutions to the optimization of GPGPU. At present, the studies of the performance optimization of the on-chip cache focus on five aspects, including TLP(thread level parallelism) throttling, memory access reordering, data flux enhancement, LLC(last level cache) optimization, and new architecture design based on NVM(non-volatile memory). This paper mainly discusses the performance optimization research methods of the on-chip cache from these aspects. In the end, some interesting research fields of the on-chip cache optimization in future are discussed. The contents of this paper have important significance on the research of the cache subsystem in GPGPU.
    Related Articles | Metrics
    Research Advances in the Interpretability of Deep Learning
    Cheng Keyang, Wang Ning, Shi Wenxi, Zhan Yongzhao
    Journal of Computer Research and Development    2020, 57 (6): 1208-1217.   DOI: 10.7544/issn1000-1239.2020.20190485
    Abstract645)   HTML2)    PDF (1226KB)(634)       Save
    The research on the interpretability of deep learning is closely related to various disciplines such as artificial intelligence, machine learning, logic and cognitive psychology. It has important theoretical research significance and practical application value in too many fields, such as information push, medical research, finance, and information security. In the past few years, there were a lot of well studied work in this field, but we are still facing various issues. In this paper, we clearly review the history of deep learning interpretability research and related work. Firstly, we introduce the history of interpretable deep learning from following three aspects: origin of interpretable deep learning, research exploration stage and model construction stage. Then, the research situation is presented from three aspects, namely visual analysis, robust perturbation analysis and sensitivity analysis. The research on the construction of interpretable deep learning model is introduced following four aspects: model agent, logical reasoning, network node association analysis and traditional machine learning model. Moreover, the limitations of current research are analyzed and discussed in this paper. At last, we list the typical applications of the interpretable deep learning and forecast the possible future research directions of this field along with reasonable and suitable suggestions.
    Related Articles | Metrics
    Computation Protocols: Analyzable Abstractions for Computing Systems
    Xu Zhiwei, Wang Yifan, Zhao Yongwei, Li Chundian
    Journal of Computer Research and Development    2020, 57 (5): 897-905.   DOI: 10.7544/issn1000-1239.2020.20200058
    Abstract556)   HTML5)    PDF (1539KB)(251)       Save
    Computing systems research is entering an era of diversity. At the same time, systems research still mainly follows the prototype development and benchmark evaluation approach, making the research cost too high to address the diversity challenge. This dilemma calls for new analyzable abstractions of computing systems. When researching a new system, we can use its abstraction to analyze its characteristics to filter out inappropriate candidate systems before costly prototyping and benchmarking. We already have such a concept for computer applications, called algorithm. Before an algorithm’s implementation and benchmark evaluation, we can usually analyze its main properties, such as time complexity and space complexity. In this paper, we summarize seven advantages of the algorithm concept and propose a preliminary counterpart for computing systems, called computation protocol. Learning from six historical lessons from systems research, we discuss a general definition, a black-box representation, and a white-box representation of the computation protocol concept. We use preliminary examples to point out that computation protocol thinking may be helpful to propose computing systems conjecture, analyze new parallel computing model, extend existing systems architecture, and inspire new system evaluation method.
    Related Articles | Metrics
    Brilliance and Darkness: Turing Test
    Yu Jian
    Journal of Computer Research and Development    2020, 57 (5): 906-911.   DOI: 10.7544/issn1000-1239.2020.20190794
    Abstract560)   HTML8)    PDF (441KB)(316)       Save
    In this paper we discuss Turing Test and its modifications, study its theoretical presumption and practical feasibility, and briefly survey the development for Turing Test. By analyzing the presumptions of a classical concept definition, the basic assumptions of Turing Test are demonstrated. In this paper, it clearly shows that the basic assumptions of Turing Test are not consistent with humans daily life and social science, which brings greatly theoretical challenges for artificial intelligence research.
    Related Articles | Metrics
    Survey on Secure Persistent Memory Storage
    Yang Fan, Li Fei, Shu Jiwu
    Journal of Computer Research and Development    2020, 57 (5): 912-927.   DOI: 10.7544/issn1000-1239.2020.20190820
    Abstract709)   HTML7)    PDF (1616KB)(183)       Save
    With the rapid development of computer technology, computer security and data privacy protection have always been the focus of academic and industrial. By providing hardware-assisted confidentiality and integrity verification, memory security mechanism helps guarantee the security of application code and data, and prevent them from malicious memory disclosure and modification. The emerging persistent memory delivers a unique combination of affordable large capacity and support for data persistence and provides high-bandwidth and low-latency data access. It can be placed on the memory bus like DRAM and will be accessed via processor loads and stores. However, due to differences in media characteristics, DRAM-oriented memory security mechanisms cannot function efficiently on persistent memory and even have availability issues. Therefore, a secure memory storage system based on persistent memory will bring new opportunities for the secure and efficient memory storage of big data. Firstly, for the write characteristics of persistent memory, the reasons for low-efficiency in applying the security measure against traditional volatile memory to persistent memory are analyzed, and related work is expounded. Secondly, for persistent memory storage, we analyze the problems that need to be considered to ensure the security of persistent memory in its whole life cycle, and introduce research work on guaranteeing the consistency between data and corresponding metadata for security. Finally, we conclude the challenges and compare the related work in building secure memory storage based on persistent memory, and share our views on future research.
    Related Articles | Metrics
    An Overview of Monaural Speech Denoising and Dereverberation Research
    Lan Tian, Peng Chuan, Li Sen, Ye Wenzheng, Li Meng, Hui Guoqiang, Lü Yilan, Qian Yuxin, Liu Qiao
    Journal of Computer Research and Development    2020, 57 (5): 928-953.   DOI: 10.7544/issn1000-1239.2020.20190306
    Abstract357)   HTML3)    PDF (2215KB)(203)       Save
    Speech enhancement refers to the use of audio signal processing techniques and various algorithms to improve the intelligibility and quality of the distorted speech signals. It has great research value and a wide range of applications including speech recognition, VoIP, tele-conference and hearing aids. Most early work utilized unsupervised digital signal analysis methods to decompose the speech signal to obtain the characteristics of the clean speech and the noise. With the development of machine learning, some supervised methods which aim to learn the relationship between noisy and clean speech signals were proposed. In particular, the introduction of deep learning has greatly improved the performance. In order to help beginners and related researchers to understand the current research status of this topic, this paper conducts a comprehensive survey of the development process of the monaural speech enhancement, and systematicall