ISSN 1000-1239 CN 11-1777/TP

Highlights

    Please wait a minute...
    For Selected: Toggle Thumbnails
    An Overview of Quantum Optimization
    He Jianhao Li Lüzhou
    Journal of Computer Research and Development    2021, 58 (9): 1823-1834.   DOI: 10.7544/issn1000-1239.2021.20210276
    Abstract210)      PDF (688KB)(204)       Save
    Quantum optimization has attracted much attention in recent years. It mainly studies how to accelerate the solution of optimization problems with quantum computing. This overview will classify the quantum optimization algorithm according to whether the optimization variable is continuous, and focus on introducing the continuous variable optimization algorithm. Through the investigation of existing work, this article obtains the following observations: 1)The works of discrete variable quantum optimization were distributed five years ago, while the works of continuous variable quantum optimization has attracted more attention in the last five years; 2)The main basic technologies used in quantum optimization were mainly proposed ten to twenty years ago, and basic innovations are needed; 3)In most works of quantum optimization, theoretical acceleration of time complexity or query complexity is achieved, but more rigorous theoretical analysis is still needed; 4)There are still many problems worthy of exploration by quantum computing researchers in the optimization field, especially in the field of non-convex optimization, which is considered to be difficult in classical computing.
    Related Articles | Metrics
    Survey of Deep Learning Based Graph Anomaly Detection Methods
    Chen Bofeng, Li Jingdong, Lu Xingjian, Sha Chaofeng, Wang Xiaoling, Zhang Ji
    Journal of Computer Research and Development    2021, 58 (7): 1436-1455.   DOI: 10.7544/issn1000-1239.2021.20200685
    Abstract822)   HTML18)    PDF (4432KB)(867)       Save
    Graph anomaly detection aims to find “strange” or “unusual” patterns in large graph or massive graph databases, and it has a wide range of application scenarios. Deep learning can learn the hidden rules from the data, and it has excellent performance in extracting potential complex patterns from data. With the great development of graph representation learning in recent years, how to detect graph anomaly using deep learning methods has attracted extensive attention in the area of academia and industry. Although a series of recent studies have investigated anomaly detection methods from the perspective of graphs, there is a lack of attention to graph anomaly detection methods under the background of deep learning. In this paper, we first give the definitions of various kinds of anomalies in static graph and dynamic graph and investigate the deep neural network based graph representation learning method and its various applications in graph anomaly detection. Then we present the current situation of research on graph anomaly detection based on deep learning from the perspective of static graph and dynamic graph, and summarize the application scenarios and related data sets of graph anomaly detection. At last, we discuss the current challenges and future research directions of graph anomaly detection.
    Related Articles | Metrics
    Survey of OpenFlow Switch Flow Table Overflow Mitigation Techniques
    Xie Shengxu, Xing Changyou, Zhang Guomin, Song Lihua, Hu Guyu
    Journal of Computer Research and Development    2021, 58 (7): 1544-1562.   DOI: 10.7544/issn1000-1239.2021.20200480
    Abstract251)   HTML6)    PDF (5153KB)(155)       Save
    The features of software defined networking (SDN) such as forwarding and control separation, centralized control, and open interfaces make the network flexible and controllable, and its architecture has been fully developed. Due to the good combination with various cloud services, SDN has received a large number of commercial deployments in recent years. In OpenFlow-based SDN architecture, ternary content addressable memory (TCAM) is mostly used on hardware switches to store flow entries installed by the controller in order to achieve such goals as fast lookup of flow entries and mask matching. However, limited by the capacity and price of TCAM, the current commercial OpenFlow switches can store at most tens of thousands of flow entries, which leads to the problem of flow table overflow caused by burst traffic or flow table overflow attacks, which seriously affects the network performance. How to establish an efficient flow table overflow mitigation mechanism has attracted extensive attention from researchers. Firstly, the causes and effects of flow table overflow problem in OpenFlow switch are discussed. On this basis, the current research status of flow table overflow mitigation technology is summarized and compared according to the two situations of burst traffic and attack behavior. Finally, the existing research problems are summarized and analyzed, and the future development direction and challenges are forecasted.
    Related Articles | Metrics
    Agile Design of Processor Chips: Issues and Challenges
    Bao Yungang, Chang Yisong, Han Yinhe, Huang Libo, Li Huawei, Liang Yun, Luo Guojie, Shang Li, Tang Dan, Wang Ying, Xie Biwei, Yu Wenjian, Zhang Ke, Sun Ninghui
    Journal of Computer Research and Development    2021, 58 (6): 1131-1145.   DOI: 10.7544/issn1000-1239.2021.20210232
    Abstract402)   HTML5)    PDF (2065KB)(476)       Save
    Design of processor chips currently relies on the performance-oriented design method that focuses on hybrid optimizations among chip frequency, area and power consumption with multi-step and repetitive iterations via modern electronic design automation (EDA) techniques. Such conventional methodology results in significant costs, long period and high technical threshold. In this paper, we introduce an object-oriented architecture (OOA) paradigm with the idea borrowed from the software engineering area, and propose an OOA-based agile processor design methodology. Unlike the conventional performance-oriented design method, the proposed OOA-based agile design method mainly aims to shorten the development cycle, and to reduce the cost and complexity without sacrificing performance and reliability, which is evaluated as a new metric, agile degree. OOA expects to implement a series of decomposable, composable, and extensible objects in architectures of both general-purpose CPUs and application-specific XPUs via the object-oriented design paradigm, language and EDA tools. We further summary the research progress in each technical field covered by OOA, and analyze the challenges that may arise in the future research of OOA-based agile design methodology.
    Related Articles | Metrics
    A Proposal of Software-Hardware Decoupling Hardware Design Method for Brain-Inspired Computing
    Qu Peng, Chen Jiajie, Zhang Youhui, Zheng Weimin
    Journal of Computer Research and Development    2021, 58 (6): 1146-1154.   DOI: 10.7544/issn1000-1239.2021.20210170
    Abstract259)   HTML2)    PDF (1130KB)(232)       Save
    Brain-inspired computing is a novel research field involving multiple disciplines, which may have important implications for the development of computational neuroscience, artificial intelligence, and computer architectures. Currently, one of the key problems in this field is that brain-inspired software and hardware are usually tightly coupled. A recent study has proposed the notion of neuromorphic completeness and the corresponding system hierarchy design. This completeness provides a theoretical support for realizing the decoupling of hardware and software of brain-inspired computing systems, and the system hierarchy design can be viewed as a reference implementation of neuromorphic complete software and hardware. As a position paper, this article first discusses several key concepts of neuromorphic completeness and the system hierarchy for brain-inspired computing. Then, as a follow-up work, we propose a design method for software-hardware decoupling hardware design of brain-inspired computing, namely, an iterative optimization process consisting of execution primitive set design and hardware implementation evaluation. Finally, we show the preliminary status of our research on the FPGA based evaluation platform. We believe that this method would contribute to the realization of extensible, neuromorphic complete computation primitive sets and chips, which is beneficial to realize the decoupling of hardware and software in the field of brain-inspired computing systems.
    Related Articles | Metrics
    Survey on Graph Neural Network Acceleration Architectures
    Li Han, Yan Mingyu, Lü Zhengyang, Li Wenming, Ye Xiaochun, Fan Dongrui, Tang Zhimin
    Journal of Computer Research and Development    2021, 58 (6): 1204-1229.   DOI: 10.7544/issn1000-1239.2021.20210166
    Abstract559)   HTML0)    PDF (3278KB)(604)       Save
    Recently, the emerging graph neural networks (GNNs) have received extensive attention from academia and industry due to the powerful graph learning and reasoning capabilities, and are considered to be the core force that promotes the field of artificial intelligence into the “cognitive intelligence” stage. Since GNNs integrate the execution process of both traditional graph processing and neural network, a hybrid execution pattern naturally exists, which makes irregular and regular computation and memory access behaviors coexist. This execution pattern makes traditional processors and the existing graph processing and neural network acceleration architectures unable to cope with the two opposing execution behaviors at the same time, and cannot meet the acceleration requirements of GNNs. To solve the above problems, acceleration architectures tailored for GNNs continue to emerge. They customize computing hardware units and on-chip storage levels for GNNs, optimize computation and memory access behaviors, and have achieved acceleration effects well. Based on the challenges faced by the GNN acceleration architectures in the design process, this paper systematically analyzes and introduces the overall structure design and the key optimization technologies in this field from computation, on-chip memory access, off-chip memory access respectively. Finally, the future direction of GNN acceleration structure design is prospected from different angles, and it is expected to bring certain inspiration to researchers in this field.
    Related Articles | Metrics
    Adversarial Attacks and Defenses for Deep Learning Models
    Li Minghui, Jiang Peipei, Wang Qian, Shen Chao, Li Qi
    Journal of Computer Research and Development    2021, 58 (5): 909-926.   DOI: 10.7544/issn1000-1239.2021.20200920
    Abstract827)   HTML31)    PDF (1577KB)(946)       Save
    Deep learning is one of the main representatives of artificial intelligence technology, which is quietly enhancing our daily lives. However, the deployment of deep learning models has also brought potential security risks. Studying the basic theories and key technologies of attacks and defenses for deep learning models is of great significance for a deep understanding of the inherent vulnerability of the models, comprehensive protection of intelligent systems, and widespread deployment of artificial intelligence applications. This paper discusses the development and future challenges of the adversarial attacks and defenses for deep learning models from the perspective of confrontation. In this paper, we first introduce the potential threats faced by deep learning at different stages. Afterwards, we systematically summarize the progress of existing attack and defense technologies in artificial intelligence systems from the perspectives of the essential mechanism of adversarial attacks, the methods of adversarial attack generation, defensive strategies against the attacks, and the framework of the attacks and defenses. We also discuss the limitations of related research and propose an attack framework and a defense framework for guidance in building better adversarial attacks and defenses. Finally, we discuss several potential future research directions and challenges for adversarial attacks and defenses against deep learning model.
    Related Articles | Metrics
    Research and Challenge of Distributed Deep Learning Privacy and Security Attack
    Zhou Chunyi, Chen Dawei, Wang Shang, Fu Anmin, Gao Yansong
    Journal of Computer Research and Development    2021, 58 (5): 927-943.   DOI: 10.7544/issn1000-1239.2021.20200966
    Abstract570)   HTML23)    PDF (2954KB)(600)       Save
    Different from the centralized deep learning mode, distributed deep learning gets rid of the limitation that the data must be centralized during the model training process, which realizes the local operation of the data, and allows all participants to collaborate without exchanging data. It significantly reduces the risk of user privacy leakage, breaks the data island from the technical level, and improves the efficiency of deep learning. Distributed deep learning can be widely used in smart medical care, smart finance, smart retail and smart transportation. However, typical attacks such as generative adversarial network attacks, membership inference attacks and backdoor attacks, have revealed that distributed deep learning still has serious privacy vulnerabilities and security threats. This paper first compares and analyzes the characteristics of the three distributed deep learning modes and their core problems, including collaborative learning, federated learning and split learning. Secondly, from the perspective of privacy attacks, it comprehensively expounds various types of privacy attacks faced by distributed deep learning, and summarizes the existing privacy attack defense methods. At the same time, from the perspective of security attacks, the paper analyzes the attack process and inherent security threats of the three security attacks: data poisoning attacks, adversarial sample attacks, and backdoor attacks, and analyzes the existing security attack defense technology from the perspectives of defense principles, adversary capabilities, and defense effects. Finally, from the perspective of privacy and security attacks, the future research directions of distributed deep learning are discussed and prospected.
    Related Articles | Metrics
    A Review of Fuzzing Techniques
    Ren Zezhong, Zheng Han, Zhang Jiayuan, Wang Wenjie, Feng Tao, Wang He, Zhang Yuqing
    Journal of Computer Research and Development    2021, 58 (5): 944-963.   DOI: 10.7544/issn1000-1239.2021.20201018
    Abstract564)   HTML32)    PDF (1225KB)(579)       Save
    Fuzzing is a security testing technique, which is playing an increasingly important role, especially in detecting vulnerabilities. Fuzzing has experienced rapid development in recent years. A large number of new achievements have emerged, so it is necessary to summarize and analyze relevant achievements to follow fuzzing’s research frontier. Based on 4 top security conferences (IEEE S&P, USENIX Security, CCS, NDSS) about network and system security, we summarized fuzzing’s basic workflow, including preprocessing, input building, input selection, evaluation, and post-fuzzing. We discussed each link’s tasks, challenges, and the corresponding research results. We emphatically analyzed the fuzzing testing method based on coverage guidance, represented by the American Fuzzy Lop tool and its improvements. Using fuzzing testing technology in different fields will face vastly different challenges. We summarized the unique requirements and corresponding solutions for fuzzing testing in specific areas by sorting and analyzing the related literature. Mostly, we focused on the Internet of Things and the kernel security field because of their rapid development and importance. In recent years, the progress of anti-fuzzing testing technology and machine learning technology has brought challenges and opportunities to the development of fuzzing testing technology. These opportunities and challenges provide direction reference for the further research.
    Related Articles | Metrics
    Research Progress of Neural Networks Watermarking Technology
    Zhang Yingjun, Chen Kai, Zhou Geng, Lü Peizhuo, Liu Yong, Huang Liang
    Journal of Computer Research and Development    2021, 58 (5): 964-976.   DOI: 10.7544/issn1000-1239.2021.20200978
    Abstract370)   HTML22)    PDF (1865KB)(515)       Save
    With the popularization and application of deep neural networks, the trained neural network model has become an important asset and has been provided as machine learning services (MLaaS) for users. However, as a special kind of user, attackers can extract the models when using the services. Considering the high value of the models and risks of being stolen, service providers start to pay more attention to the copyright protection of their models. The main technique is adopted from the digital watermark and applied to neural networks, called neural network watermarking. In this paper, we first analyze this kind of watermarking and show the basic requirements of the design. Then we introduce the related technologies involved in neural network watermarking. Typically, service providers embed watermarks in the neural networks. Once they suspect a model is stolen from them, they can verify the existence of the watermark in the model. Sometimes, the providers can obtain the suspected model and check the existence of watermarks from the model parameters (white-box). But sometimes, the providers cannot acquire the model. What they can only do is to check the input/output pairs of the suspected model (black-box). We discuss these watermarking methods and potential attacks against the watermarks from the viewpoint of robustness, stealthiness, and security. In the end, we discuss future directions and potential challenges.
    Related Articles | Metrics
    A Survey of Intelligent Malware Detection on Windows Platform
    Wang Jialai, Zhang Chao, Qi Xuyan, Rong Yi
    Journal of Computer Research and Development    2021, 58 (5): 977-994.   DOI: 10.7544/issn1000-1239.2021.20200964
    Abstract271)   HTML14)    PDF (3010KB)(288)       Save
    In recent years, malware has brought many negative effects to the development of information technology. In order to solve this problem, how to effectively detect malware has always been a concern. With the rapid development of artificial intelligence, machine learning and deep learning technologies are gradually introduced into the field of malware detection. This type of technology is called intelligent malware detection technology. Compared with traditional detection methods, intelligent detection technology does not need to manually formulate detection rules due to the application of artificial intelligence technology. Besides, intelligent detection technology has stronger generalization capabilities, and can better detect previously unseen malware. Intelligent malware detection has become a research hotspot in the field of detection. This paper mainly introduces current work related to intelligent malware detection, which includes the main parts required for intelligent detection processes. Specifically, we have systematically explained and classified related work for intelligent malware detection in this paper, which includes the features commonly used in intelligent detection, how to perform feature processing, the commonly used classifiers in intelligent detection, and the main problems faced by current malware intelligent detection. Finally, we summarize the full paper and clarify the potential future research directions, aiming to contribute to the development of intelligent malware detection.
    Related Articles | Metrics
    Intelligent Requirements Elicitation and Modeling: A Literature Review
    Wang Ye, Chen Junwu, Xia Xin, Jiang Bo
    Journal of Computer Research and Development    2021, 58 (4): 683-705.   DOI: 10.7544/issn1000-1239.2021.20200740
    Abstract635)   HTML87)    PDF (1693KB)(522)       Save
    Requirements elicitation and modeling refer to the process of obtaining explicit or implicit requirements from the requirements text described in natural language, and constructing the corresponding models through tabular, graphical, and formulaic methods. Requirements elicitation and modeling is an extremely critical step in software development process, which paves the way for subsequent system design and implementation, improves the efficiency and quality of software development, and improves the stability and feasibility of software systems. Researchers have obtained a series of research achievements in requirements elicitation and modeling. Requirements elicitation and modeling can be generally divided into three steps: requirements knowledge extraction, requirements knowledge classification and requirements model construction. Due to the fact that traditional requirements elicitation and modeling approaches have problems in terms of accuracy and efficiency of model construction, in recent years, more and more researchers have integrated widely applicable artificial intelligence techniques with these approaches, and put forward a series of intelligent require-ments elicitation and modeling approaches, so as to make up for the deficiencies of the traditional methods. This paper focuses on the perspective of intelligent requirements elicitation and modeling, and sorts out and summarizes the research progress of requirements elicitation and modeling in recent years. The main contents include: 1)statistics and analysis of the artificial intelligence techniques applied in requirements knowledge extraction, requirements knowledge classification and requirements model construction; 2)summarizing the verification and evaluation methods used in the process of intelligent requirements elicitation and modeling; 3)summarizing the key issues of intelligent require-ments elicitation and modeling from two aspects of scientific problems and technical difficulties, and elaborating on the six research trends including integrated and dynamic model construction, mining the relationships among intelligent requirements elicitation and modeling and other software engineering activities, refining the granularity of intelligent requirements modeling, data sets construction, evaluation metrics construction and industrial practice as the possible solutions to the above problems. The future development trend of intelligent requirements elicitation and modeling research is also discussed.
    Related Articles | Metrics
    A Survey of Cache-Based Side Channel Countermeasure
    Wang Chong, Wei Shuai, Zhang Fan, Song Ke
    Journal of Computer Research and Development    2021, 58 (4): 794-810.   DOI: 10.7544/issn1000-1239.2021.20200500
    Abstract249)   HTML22)    PDF (1052KB)(179)       Save
    Microarchitectural side channel attack uses microarchitecture state to stole information from victim. It breaks the isolation offered by operation system, sandbox and so on, which seriously threatens information security and private, thus it receives extensive attention from academia. Unlike other traditional side channel attacks, microarchitectural side channel attack doesn’t require physical contact, nor complex analysis device, and it only needs co-run some code with victim in some share resources. Cache-based side channel attack uses cache such as private L1 Cache and LLC (last level cache) to learn the access pattern of other application, and uses this access pattern to infer secrets. Owning to the fact that cache is widely used in modern CPU, cache-based side channel attack is the most attractive attacks. It’s still an open challenge to defense this kind of attack. In this paper, we firstly introduce the basic architecture and theory related with microarchitectural side channel especially cache-based side channel attack. Then, we consolidate existing attack methods into an attack model from attacker ability, attack steps and attack target. According to this model, we classify types of the main existing countermeasure to cache-based side channel attack, and focus on the design of the new secure cache architecture. Finally, we present the trends in countermeasure, challenge to combating them and future directions especially new cache architecture.
    Related Articles | Metrics
    A Survey on Graph Processing Accelerators
    Yan Mingyu, Li Han, Deng Lei, Hu Xing, Ye Xiaochun, Zhang Zhimin, Fan Dongrui, Xie Yuan
    Journal of Computer Research and Development    2021, 58 (4): 862-887.   DOI: 10.7544/issn1000-1239.2021.20200110
    Abstract462)   HTML46)    PDF (3590KB)(485)       Save
    In the big data era, graphs are used as effective representations of data with the complex relationship in many scenarios. Graph processing applications are widely used in various fields to dig out the potential value of graph data. The irregular execution pattern of graph processing applications introduces irregular workload, intensive read-modify-write updates, irregular memory accesses, and irregular communications. Existing general architectures cannot effectively handle the above challenges. In order to overcome these challenges, a large number of graph processing accelerator designs have been proposed. They tailor the computation pipeline, memory subsystem, storage subsystem, and communication subsystem to the graph processing application. Thanks to these hardware customizations, graph processing accelerators have achieved significant improvements in performance and energy efficiency compared with the state-of-the-art software frameworks running on general architectures. In order to allow the related researchers to have a comprehensive understanding of the graph processing accelerator, this paper first classifies and summarizes customized designs of existing work based on the computer’s pyramid organization structure from top to bottom. This article then discusses the accelerator design of the emerging graph processing application (i.e., graph neural network) with specific graph neural network accelerator cases. In the end, this article discusses the future design trend of the graph processing accelerator.
    Related Articles | Metrics
    Research on Optimal Performance of Sparse Matrix-Vector Multiplication and Convoulution Using the Probability-Process-Ram Model
    Xie Zhen, Tan Guangming, Sun Ninghui
    Journal of Computer Research and Development    2021, 58 (3): 445-457.   DOI: 10.7544/issn1000-1239.2021.20180601
    Abstract698)   HTML17)    PDF (3400KB)(475)       Save
    Performance models provide insightful perspectives to allow us to predict performance and propose optimization guidance. Although there has been much research, pinpointing bottlenecks of various memory access patterns and reaching high performance of both regular and irregular programs on various hardware configurations are still not trivial. In this work, we propose a novel model called probability-process-ram (PPR) to quantify the amount of compute and data transfer time on general-purpose multicore processors. The PPR model predicts the number of instruction for single-core and probability of memory access between each memory hierarchy through a newly designed cache simulator. By using the automatically extracted best optimization method and expectation, we use PPR model for analyzing and optimizing sparse matrix-vector multiplication and 1D convolution as case study for typical irregular and regular computational kernels. Then we obtain best block sizes for sparse matrices with various sparsity structures, as well as optimal optimization guidance for 1D convolution with different instruction sets support and data sizes. Comparison with Roofline model and ECM model, the proposed PPR model greatly improves prediction accuracy by the newly designed cache simulator and achieves comprehensive feedback ability.
    Related Articles | Metrics
    Bidirectional-Bitmap Based CSR for Reducing Large-Scale Graph Space
    Gan Xinbiao, Tan Wen, Liu Jie
    Journal of Computer Research and Development    2021, 58 (3): 458-466.   DOI: 10.7544/issn1000-1239.2021.20200090
    Abstract329)   HTML13)    PDF (1793KB)(157)       Save
    Graph500 is an important and famous benchmark to evaluate data-intensive applications for supercomputers in the big data era. The graph traversal processing ability of pre-exascale system is mainly restricted to memory space and communication bandwidth, especially the utilization of memory space ultimately determines the testing graph scale, and the graph testing scale absolutely dominates the performance of Graph500. Hence, Bi-CSR (bidirectional-bitmap CSR) is proposed based on CSR (compressed sparse row) for testing Graph500 on Tianhe pre-exascale system, The Bi-CSR would reduce large-scale graph space by introducing row-bitmap and column-bitmap to compress sparse matrix storage for Graph500.The aim of row-bitmap based on CSR is mainly cutting down graph memory space, while column-bitmap based on CSR would not only further reduce memory space but also improve graph traversal by using array of VPE(vector processing element), because VPEs are optimized and equipped in Tianhe pre-exascale system, which would speedup graph traversal when making fully use of VPEs. Accordingly, Bi-CSR would greatly reduce large-scale graph space while introducing row-bitmap and column-bitmap to compress sparse matrix storage of Graph500 for Tianhe pre-exascale system. Experimental results demonstrate that Bi-CSR would reduce large-scale graph space by 70% when testing input of Graph500 is 2\+\{37\} on Tianhe pre-exascale system with 2.131E+12 TEPS (traversed edges per second).
    Related Articles | Metrics
    Review on Text Mining of Electronic Medical Record
    Wu Zongyou, Bai Kunlong, Yang Linrui, Wang Yiqi, Tian Yingjie
    Journal of Computer Research and Development    2021, 58 (3): 513-527.   DOI: 10.7544/issn1000-1239.2021.20200402
    Abstract866)   HTML36)    PDF (846KB)(610)       Save
    Electronic medical records (EMR), produced with the development of hospital informa-tionization and contained rich medical information and clinical knowledge, play important roles in guiding and assisting clinical decision-making and drug mining. Therefore, how to efficiently mine important information in a large amount of electronic medical records is an essential research topic. In recent years, with the vigorous development of computer technology, especially machine learning and deep learning, data mining in the special field of electronic medical records have been raised to a new height. This review aims to guide future development in the field of electronic medical record text mining by analyzing the current status of electronic medical record research. Specifically, this paper begins with an introduction to the characteristics of electronic medical record data and introduces how to preprocess electronic medical record data; then four typical tasks around electronic medical record data mining (medical named entity recognition, relationship extraction, text classification and smart interview) introduce popular model methods; finally, from the perspective of the application of electronic medical record data mining in characteristic diseases, two specific diseases of diabetes and cardio-cerebrovascular diseases are combined and a brief introduction to the existing application scenarios of electronic medical records is given.
    Related Articles | Metrics
    Blockchain-Based Data Transparency: Issues and Challenges
    Meng Xiaofeng, Liu Lixin
    Journal of Computer Research and Development    2021, 58 (2): 237-252.   DOI: 10.7544/issn1000-1239.2021.20200017
    Abstract1282)   HTML25)    PDF (1812KB)(939)       Save
    With the high-speed development of Internet of things, wearable devices and mobile communication technology, large-scale data continuously generate and converge to multiple data collectors, which influences people’s life in many ways. Meanwhile, it also causes more and more severe privacy leaks. Traditional privacy aware mechanisms such as differential privacy, encryption and anonymization are not enough to deal with the serious situation. What is more, the data convergence leads to data monopoly which hinders the realization of the big data value seriously. Besides, tampered data, single point failure in data quality management and so on may cause untrustworthy data-driven decision-making. How to use big data correctly has become an important issue. For those reasons, we propose the data transparency, aiming to provide solution for the correct use of big data. Blockchain originated from digital currency has the characteristics of decentralization, transparency and immutability, and it provides an accountable and secure solution for data transparency. In this paper, we first propose the definition and research dimension of the data transparency from the perspective of big data life cycle, and we also analyze and summary the methods to realize data transparency. Then, we summary the research progress of blockchain-based data transparency. Finally, we analyze the challenges that may arise in the process of blockchain-based data transparency.
    Related Articles | Metrics
    Fairness Research on Deep Learning
    Chen Jinyin, Chen Yipeng, Chen Yiming, Zheng Haibin, Ji Shouling, Shi Jie, Cheng Yao
    Journal of Computer Research and Development    2021, 58 (2): 264-280.   DOI: 10.7544/issn1000-1239.2021.20200758
    Abstract973)   HTML25)    PDF (1752KB)(813)       Save
    Deep learning is an important field of machine learning research, which is widely used in industry for its powerful feature extraction capabilities and advanced performance in many applications. However, due to the bias in training data labeling and model design, research shows that deep learning may aggravate human bias and discrimination in some applications, which results in unfairness during the decision-making process, thereby will cause negative impact to both individuals and socials. To improve the reliability of deep learning and promote its development in the field of fairness, we review the sources of bias in deep learning, debiasing methods for different types biases, fairness measure metrics for measuring the effect of debiasing, and current popular debiasing platforms, based on the existing research work. In the end we explore the open issues in existing fairness research field and future development trends.
    Related Articles | Metrics
    Survey on Automatic Text Summarization
    Li Jinpeng, Zhang Chuang, Chen Xiaojun, Hu Yue, Liao Pengcheng
    Journal of Computer Research and Development    2021, 58 (1): 1-21.   DOI: 10.7544/issn1000-1239.2021.20190785
    Abstract1702)   HTML57)    PDF (1756KB)(1550)       Save
    In recent years, the rapid development of Internet technology has greatly facilitated the daily life of human, and it is inevitable that massive information erupts in a blowout. How to quickly and effectively obtain the required information on the Internet is an urgent problem. The automatic text summarization technology can effectively alleviate this problem. As one of the most important fields in natural language processing and artificial intelligence, it can automatically produce a concise and coherent summary from a long text or text set through computer, in which the summary should accurately reflect the central themes of source text. In this paper, we expound the connotation of automatic summarization, review the development of automatic text summarization technique and introduce two main techniques in detail: extractive and abstractive summarization, including feature scoring, classification method, linear programming, submodular function, graph ranking, sequence labeling, heuristic algorithm, deep learning, etc. We also analyze the datasets and evaluation metrics that are commonly used in automatic summarization. Finally, the challenges ahead and the future trends of research and application have been predicted.
    Related Articles | Metrics