ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 December 2021, Volume 58 Issue 12
Interpretable Few-Shot Learning with Contrastive Constraint
Zhang Lingling, Chen Yiwei, Wu Wenjun, Wei Bifan, Luo Xuan, Chang Xiaojun, Liu Jun
2021, 58(12):  2573-2584.  doi:10.7544/issn1000-1239.2021.20210999
Asbtract ( 758 )   HTML ( 17)   PDF (3204KB) ( 564 )  
Related Articles | Metrics
Different from deep learning with large scale supervision, few-shot learning aims to learn the samples characteristics from a few labeled examples. Apparently, few-shot learning is more in line with the visual cognitive mechanism of the human brain. In recent years, few-shot learning has attracted more researchers attention. In order to discover the semantic similarities between the query set (unlabeled image) and support set (few labeled images) in feature embedding space, methods which combine meta-learning and metric learning have emerged and achieved great performance on few-shot image classification tasks. However, these methods lack the interpretability, which means they could not provide a reasoning explainable process like human cognitive mechanism. Therefore, we propose a novel interpretable few-shot learning method called INT-FSL based on the positional attention mechanism, which aims to reveal two key problems in few-shot classification: 1)Which parts of the unlabeled image play an important role in classification task; 2)Which class of features reflected by the key parts. Besides, we design the contrastive constraints on global and local levels in every few-shot meta task, for alleviating the limited supervision with the internal information of the data. We conduct extensive experiments on three image benchmark datasets. The results show that the proposed model INT-FSL not only could improve the classification performance on few-shot learning effectively, but also has good interpretability in the reasoning process.
Spatio-Clock Synchronous Constraint Guided Safe Reinforcement Learning for Autonomous Driving
Wang Jinyong, Huang Zhiqiu, Yang Deyan, Xiaowei Huang, Zhu Yi, Hua Gaoyang
2021, 58(12):  2585-2603.  doi:10.7544/issn1000-1239.2021.20211023
Asbtract ( 532 )   HTML ( 19)   PDF (3204KB) ( 503 )  
Related Articles | Metrics
Autonomous driving systems integrate complex interactions between hardware and software. In order to ensure the safe and reliable operations, formal methods are used to provide rigorous guarantees to satisfy logical specifications and safety-critical requirements in the design stage. As a widely employed machine learning architecture, deep reinforcement learning (DRL) focuses on finding an optimal policy that maximizes a cumulative discounted reward by interacting with the environment, and has been applied to autonomous driving decision-making modules. However, black-box DRL-based autonomous driving systems cannot provide guarantees of safe operation and reward definition interpretability techniques for complex tasks, especially when they face unfamiliar situations and reason about a greater number of options. In order to address these problems, spatio-clock synchronous constraint is adopted to augment DRL safety and interpretability. Firstly, we propose a dedicated formal properties specification language for autonomous driving domain, i.e., spatio-clock synchronous constraint specification language, and present domain-specific knowledge requirements specification that is close to natural language to make the reward functions generation process more interpretable. Secondly, we present domain-specific spatio-clock synchronous automata to describe spatio-clock autonomous behaviors, i.e., controllers related to certain spatio- and clock-critical actions, and present safe state-action space transition systems to guarantee the safety of DRL optimal policy generation process. Thirdly, based on the formal specification and policy learning, we propose a formal spatio-clock synchronous constraint guided safe reinforcement learning method with the goal of easily understanding the safe reward function. Finally, we demonstrate the effectiveness of our proposed approach through an autonomous lane changing and overtaking case study in the highway scenario.
NeuroSymbolic Task and Motion Planner for Disassembly Electric Vehicle Batteries
Ren Wei, Wang Zhigang, Yang Hua, Zhang Yisheng, Chen Ming
2021, 58(12):  2604-2617.  doi:10.7544/issn1000-1239.2021.20211002
Asbtract ( 334 )   HTML ( 2)   PDF (3318KB) ( 205 )  
Related Articles | Metrics
Establishing a perfect electric vehicle battery recycling system is one of the bottlenecks that need to be broken through in pursuit of high-quality development of new energy vehicles in our country. Disassembly technology will play an important role in research and development of intelligent, flexible, and refined high-efficiency. Due to its unstructured environment and high uncertainties, disassembling batteries is primarily accomplished by humans with a fixed robot-assisted battery disassembly workstation. This method is highly inefficient and in dire need of being upgraded to an automated and intelligent one to exempt humans from being exposed to the high voltage and toxic working conditions. The process of removing and sorting electric vehicle batteries represents a significant challenge to the automation industry since used batteries are of distinctive specifications that renders pre-programming impossible. A novel framework for NeuroSymbolic based task and motion planning method to automatically disassemble batteries in unstructured environment using robots is proposed. It enables robots to independently locate and loose battery bolts, with or without obstacles. This method has advantages in its autonomy, scalability, explicability, and learnability. These advantages pave the way for more accurate and robust system to disassemble electric vehicle battery packs using robots. This study not only provides a solution for intelligently disassembling electric vehicle batteries, but also verifies its feasibility through a set of test results with the robot accomplishing the disassemble task in a complex and dynamic environment.
Interpretable Deep Knowledge Tracing
Liu Kunjia, Li Xinyi, Tang Jiuyang, Zhao Xiang
2021, 58(12):  2618-2629.  doi:10.7544/issn1000-1239.2021.20211021
Asbtract ( 628 )   HTML ( 13)   PDF (2113KB) ( 492 )  
Related Articles | Metrics
The task of knowledge tracing involves tracking users’ cognitive states by modeling their exercise-answering sequence, predicting their performance over time, and achieving an intelligent assessment of the users’ knowledge. Current works mainly model the skills related to the exercises, while ignoring the rich information contained in the contexts of exercises. Moreover, the current deep learning-based methods are agnostic, which undermines the explainability of the model. In this paper, we propose an interpretable deep knowledge tracking (IDKT) framework. First, we alleviate the data sparsity problem by using the contextual information of the exercises and skills to obtain more representative exercise and skill representations. Then the hidden knowledge states are fused with the aforementioned embeddings to learn a personalized attention, which is later used to aggregate neighbor embeddings in the exercise-skill graph. Finally, given a prediction result, an inference path is selected as the explanation based on the personalized attention. Compared with typical existing methods, IDKT exhibits its superiority by not only achieving the best prediction performance, but also providing an explanation at the inference path level for the prediction results.
Hierarchical Attention Network Based Interpretable Knowledge Tracing
Sun Jianwen, Zhou Jianpeng, Liu Sannüya, He Feijuan, Tang Yun
2021, 58(12):  2630-2644.  doi:10.7544/issn1000-1239.2021.20210997
Asbtract ( 495 )   HTML ( 19)   PDF (1645KB) ( 448 )  
Related Articles | Metrics
Knowledge tracing is a data-driven learner modeling technology, which aims to predict learners’ knowledge mastery or future performance based on their historical learning data. Recently, with the support of deep learning algorithms, deep learning-based knowledge tracing has become a current research hotspot in the field. Aiming at the problems that deep learning-based knowledge tracing models generally have ‘black-box’ attributes, the decision-making process or results lack interpretability, and it is difficult to provide high-value education services such as learning attribution analysis and wrong cause backtracking, a Hierarchical Attention network based Knowledge Tracing model (HAKT) is proposed. By mining the multi-dimensional and in-depth semantic association between questions, a network structure containing three-layer attention of questions, semantics and elements is established, where graph attention neural network and self-attention mechanism are utilized for question representation learning, semantic fusion and questions retrieve. A regularization term to improve model interpretability is introduced into the loss function, with which a trade-off factor is incorporated to balance predictive performance and interpretability of model. Besides, we define an interpretability measurement index for the prediction results—fidelity, which can quantitatively evaluate the model interpretability. Finally, the experimental results on 6 benchmark datasets show that our method effectively improves the model interpretability.
Dr.Deep: Interpretable Evaluation of Patient Health Status via Clinical Feature’s Context Learning
Ma Liantao, Zhang Chaohe, Jiao Xianfeng, Wang Yasha, Tang Wen, Zhao Junfeng
2021, 58(12):  2645-2659.  doi:10.7544/issn1000-1239.2021.20211022
Asbtract ( 404 )   HTML ( 11)   PDF (3062KB) ( 364 )  
Related Articles | Metrics
Deep-learning-based health status representation learning is a fundamental research problem in clinical prediction and has raised much research interest. Existing models have shown superior performance, but they fail to explore personal characteristics and provide fine-grained interpretability thoroughly. In this work, we develop a general health status
Reciprocal-Constrained Interpretable Job Recommendation
Zhu Haiping, Zhao Chengcheng, Liu Qidong, Zheng Qinghua, Zeng Jiangwei, Tian Feng, Chen Yan
2021, 58(12):  2660-2672.  doi:10.7544/issn1000-1239.2021.20211008
Asbtract ( 376 )   HTML ( 8)   PDF (1267KB) ( 280 )  
Related Articles | Metrics
Current college student job recommendation methods based on collaborative filtering and latent factor model only consider job interests of students and ignore the requirements of employers, often leading to ‘capability mismatch’. Moreover, in most of the historical employment data, only one employment record per student is stored, which leads to unreliable negative samples and affects recommendation performance. Additionally, many methods ignore the demand for recommendation result interpretability. To this end, inspired by the idea of multi-task learning, we construct a reciprocal-constrained interpretable job recommendation method. In which, we introduce attention mechanism to extract bidirectional preferences of both students and employers, and then use fuzzy gate mechanism to adaptively aggregate them in order to alleviate the problem of capability mismatch. Next, we propose a recommendation interpretation module oriented to employer intention and employer characteristics to meet the interpretability demand. We also propose a similarity-based negative sampling method to solve the problem of incredible negative samples. The results of experiment on a real-world undergraduate employment dataset of five years, EMDAU, indicate that our method outperforms other classic and state-of-art recommendation methods and has over 6% improvement in AUC. Besides, the results of ablation experiments conducted verify the effectiveness of each module in our method.
Graph Matching Network for Interpretable Complex Question Answering over Knowledge Graphs
Sun Yawei, Cheng Gong, Li Xiao, Qu Yuzhong
2021, 58(12):  2673-2683.  doi:10.7544/issn1000-1239.2021.20211004
Asbtract ( 646 )   HTML ( 8)   PDF (1034KB) ( 556 )  
Related Articles | Metrics
Question answering over knowledge graphs is a trending research topic in artificial intelligence. In this task, the semantic matching between the structures of a natural language question and a knowledge graph is a challenging research problem. Existing works mainly use a sequence-based deep neural encoder to process questions. They construct a semantic matching model to compute the similarity between question structures and subgraphs of a knowledge graph. However, they could not exploit the structure of a complex question, and they lack interpretability. To alleviate this issue, this paper presents a graph matching network (GMN) based method for answering complex questions of a knowledge graph, called TTQA. This method firstly constructs an ungrounded query graph which is independent of the knowledge graph via syntactic parsing. Then, based on the ungrounded query graph and the knowledge graph, this method constructs a grounded query graph which is dependent on the knowledge graph. In particular, this paper proposes a cross-graph attention GMN which combines pre-trained language model and graph neural network to learn the context representation of a query. The context representation enhances the representation of graph matching which helps to predict a grounded query. Experimental results show that TTQA achieves state-of-the-art results on LC-QuAD 1.0 and ComplexWebQuestions 1.1. Ablation studies demonstrate the effectiveness of GMN. In addition, TTQA keeps the ungrounded query and the grounded query to enhance the interpretability of question answering.
FPGA Verification for Heterogeneous Multi-Core Processor
Li Xiaobo, Tang Zhimin, Li Wen
2021, 58(12):  2684-2695.  doi:10.7544/issn1000-1239.2021.20200289
Asbtract ( 217 )   HTML ( 6)   PDF (3069KB) ( 130 )  
Related Articles | Metrics
With the development of processor architecture, high-performance heterogeneous multi-core processors are emerging. Since the design of high-performance heterogeneous multi-core processor is very complex, in order to reduce the design risk, shorten the verification cycle, carry out software development in advance, reproduce the post-silicon problems, we usually need to build a prototype verification platform of field programmable gate array (FPGA), and based on the FPGA platform to carry out a variety of software and hardware verification and debugging work with different functions. This paper presents a method of debugging and verifying heterogeneous multi-core high-performance processor based on homogeneous FPGA platform which effectively utilizes the architecture characteristics of heterogeneous multi-core processor and the symmetry characteristics of homogeneous FPGA platform, divides FPGA by hierarchical top down method, builds the platform from bottom to up. The combination of speed bridge, adaptive delay adjustment, embedded virtual logic analyzer and other technologies can quickly complete the FPGA platform bring-up and deployment. The proposed multi-core complementary, inter-core replacement simulation method with debug SHELL can verify the target high-performance heterogeneous multi-core processor quickly and completely. Through the FPGA prototyping platform, we have successfully completed the pre-silicon verification,software hardware co-development and testing, post-silicon bug reproduce and also provided a fast hardware platform for the next generation processor’s architecture design.
Design and Analysis of Reliability and Availability on Sunway TaihuLight
Gao Jiangang, Hu Jin, Gong Daoyong, Fang Yanfei, Liu Xiao, He Wangquan, Jin Lifeng, Zheng Fang, Li Hongliang
2021, 58(12):  2696-2707.  doi:10.7544/issn1000-1239.2021.20200967
Asbtract ( 437 )   HTML ( 9)   PDF (2778KB) ( 357 )  
Related Articles | Metrics
With the rapid growth of the system size and integration, the reliability and availability issues have become the major challenges to develop the exascale computer system. In the paper, the design and implementation of the reliability and availability on Sunway TaihuLight, a leadership-class supercomputer, are thoroughly analyzed. Firstly, the architecture of Sunway TaihuLight supercomputer is briefly described. Secondly, the reliability improvement techniques and the active and the passive fault tolerant techniques including the fault prediction, the active migration and the job local degradation are presented. Moreover, the fault tolerance system of multi-level active and passive collaboration is established on Sunway TaihuLight. Thirdly, the comprehensive failure distribution and the main sources of the failures are analyzed on the basis of the system failure statistics information. Specifically, combined with the three typical life cycle distribution, the exponential, the lognormal and the Weibull, the paper performs the data fitting analysis of the failure interval distribution on Sunway TaihuLight. The maximum likelihood estimation and the K-S(Kolmogorov Smirnov)test results indicate that the lognormal distribution fits the best with the failure empirical data. The failure distribution model of Sunway TaihuLight is established and the mean time between the failures of the system is calculated. Furthermore, the accuracy of the fault prediction is studied, and the performance as well as the time overhead of the fault tolerance techniques, such as the active migration and the job local degradation, is analyzed according to the system statistical results and the application tests. Finally, several instructive proposals to enhance the reliability and availability of the future exascale supercomputers are put forward based on the analysis of the reliability and availability on Sunway TaihuLight supercomputer.
Anomaly Detection and Modeling of Surveillance Video
Yang Fan, Xiao Bin, Yu Zhiwen
2021, 58(12):  2708-2723.  doi:10.7544/issn1000-1239.2021.20200638
Asbtract ( 915 )   HTML ( 24)   PDF (1129KB) ( 695 )  
Related Articles | Metrics
With the development of Internet of Things technology, monitoring equipment has been widely deployed in public areas such as traffic arteries, schools and hospitals, shopping malls and supermarkets, and residential buildings. These devices provide a hidden safety and generate a lot of surveillance videos. Anomaly detection based on surveillance videos involves research efforts in image processing, machine vision, deep learning, and other related fields. In the paper, the intuitionistic description and anomaly detection of video anomalies are simply summarized, and some review articles did not cover the complete research scope about feature representation and modeling of the anomaly detection, as well as vague division. The research based on video anomaly detection is comprehensively analyzed. Firstly, the traditional classical and emerging video anomaly detection algorithms are classified and described from the aspects of anomaly detection feature representation and modeling. Then, we compare different algorithms based on distance, probability, and reconstruction, analyze the advantages and disadvantages of different models and characteristics of each model. Furthermore, we conclude the evaluation criteria of existing approaches and give the new accurate efficient evaluation index. Finally, we introduce the common datasets of surveillance videos on anomaly detection, summarize the detection effects of different algorithms on the common datasets, and discuss some challenges and future research directions in practical application.
Survey on Deep Learning Based Crowd Counting
Yu Ying, Zhu Huilin, Qian Jin, Pan Cheng, Miao Duoqian
2021, 58(12):  2724-2747.  doi:10.7544/issn1000-1239.2021.20200699
Asbtract ( 1159 )   HTML ( 26)   PDF (7650KB) ( 797 )  
Related Articles | Metrics
Crowd counting, aiming to estimate the number, density or distribution of crowds in images or videos, belongs to the research category of object counting. It has been widely employed in crowd behavior analysis and public safety management to detect crowding or abnormal behavior in time to avoid accidents. In the past decades, although tremendous efforts have been made to enhance the performance of crowd counting algorithms, some long-standing challenges, such as cross-scene counting, perspective distortion and scale variation, remain unresolved. Along this line, an emerging research trend is to exploit the deep learning technologies for crowd counting. It has been proven to be an effective way to address the above issues. In this paper, crowd counting models based on deep learning are reviewed, analyzed, and discussed. Firstly, crowd counting models are introduced in details from the perspective of their principles, steps, and model variants, and the difference between the crowd counting models based on traditional methods and the crowd counting models based on deep learning are analyzed. Then the research status of crowd counting based on deep learning are expounded from four aspects: network structure, ground-truth generation, loss function and evaluation index. Meanwhile, the characteristics of various crowd counting data sets are compared and analyzed. Finally, some future directions of crowd counting are given.
A Heterogeneous Approach for 3D Object Detection
Lü Zhuo, Yao Zhicheng, Jia Yuxiang, Bao Yungang
2021, 58(12):  2748-2759.  doi:10.7544/issn1000-1239.2021.20200595
Asbtract ( 288 )   HTML ( 5)   PDF (3627KB) ( 272 )  
Related Articles | Metrics
3D object detection is an important research direction of computer vision, and has a wide range of applications in areas such as autonomous driving. Existing cutting-edge works use end-to-end deep learning methods. Although it has achieved good detection results, it has problems such as high algorithm complexity, large calculation volume, and insufficient real-time performance. After analysis, we found that the deep learning method is not suitable for solving “partial tasks” in 3D object detection. For this reason, this paper proposes a 3D object detection scheme based on heterogeneous methods. This method uses both deep learning and traditional algorithms in the detection process, and divides the detection process into multi-task stages: 1)Use deep learning methods to obtain information such as the mask and object category of the detected object from the detected picture; 2) Based on the mask, use the fast clustering method to filter out the surface radar points of the target object from the radar point cloud space; 3) Use the information such as the object’s mask, category and radar point cloud to calculate the object’s orientation, border and other information to finally realize 3D object detection. We have implemented this method systematically, which we call HA3D (a heterogeneous approach for 3D object detection). Experiments show that on the 3D detection data set KITTI for cars, the method in this paper is within the acceptance range of detection accuracy decline (2.0%) compared with the representative 3D object detection method based on deep learning, the speed is increased by 52.2%. The ratio of the accuracy to the calculation time has increased by 49%. From the perspective of comprehensive performance, this method has obvious advantages.
Survey on Deep Learning Based Facial Attribute Recognition Methods
Lai Xinyu, Chen Si, Yan Yan, Wang Dahan, Zhu Shunzhi
2021, 58(12):  2760-2782.  doi:10.7544/issn1000-1239.2021.20200870
Asbtract ( 1218 )   HTML ( 26)   PDF (5867KB) ( 789 )  
Related Articles | Metrics
Facial attribute recognition is one of the most popular research topics in computer vision and pattern recognition, and has great research significance of analyzing and understanding facial images. At the same time, it has a wide range of practical application value in many fields such as image retrieval, face recognition, micro-expression recognition and recommendation system. With the rapid development of deep learning, a large number of deep learning based facial attribute recognition (termed DFAR) methods have been put forward by domestic and foreign scholars. First the overall process of the facial attribute recognition method is described. Then, according to the different mechanisms of model construction, the part-based and holistic DFAR methods are reviewed and discussed in detail, respectively. Specifically, the part-based DFAR methods are classified according to whether or not to adopt the regular area localization technique, while the holistic DFAR methods are distinguished from the perspectives of single-task learning and multi-task learning, where multi-task learning based DFAR methods are further subdivided according to whether the attribute grouping strategy is used. Next, several popular databases and evaluation metrics on facial attribute recognition are introduced, and the performance of the state-of-the-art DFAR methods is compared and analyzed. Finally, the future research directions of the DFAR methods are provided.
Adaptive Virtual Machine Consolidation Method Based on Deep Reinforcement Learning
Yu Xian, Li Zhenyu, Sun Sheng, Zhang Guangxing, Diao Zulong, Xie Gaogang
2021, 58(12):  2783-2797.  doi:10.7544/issn1000-1239.2021.20200366
Asbtract ( 284 )   HTML ( 11)   PDF (3297KB) ( 227 )  
Related Articles | Metrics
The problem of service quality optimization with energy consumption restriction has always been one of the big challenges for virtual machine (VM) resource management in data centers. Although existing work has reduced energy consumption and improved system service quality to a certain extent through VM consolidation technology, these methods are usually difficult to achieve long-term optimal management goals. Moreover, their performance is susceptible to the change of application scenarios, such that they are difficult to be replaced and will produce much management cost. In view of the problem that VM resource management in data center is hard to achieve long-term optimal energy efficiency and service quality, and also has poor flexibility in policy adjustment, this paper proposes an adaptive VM consolidation method based on deep reinforcement learning. This method builds an end-to-end decision-making model from data center system state to VM migration strategy through state tensor representation, deterministic action output, convolution neural network and weighted reward mechanism; It also designs an automatic state generation mechanism and an inverting gradient limitation mechanism to improve deep deterministic strategy gradient algorithm, speed up the convergence speed of VM migration decision-making model, and guarantee the approximately optimal management performance. Simulation experiment results based on real VM load data show that compared with popular VM consolidation methods in open source cloud platforms, this method can effectively reduce energy consumption and improve system service quality.
A DAG-Based Network Traffic Scheduler
Shi Yang, Wen Mei, Fei Jiawei, Zhang Chunyuan
2021, 58(12):  2798-2810.  doi:10.7544/issn1000-1239.2021.20200568
Asbtract ( 284 )   HTML ( 14)   PDF (1826KB) ( 257 )  
Related Articles | Metrics
Nowadays, it is common that distributed jobs within a datacenter compete for different resources, especially the network. Due to this competition, these jobs’ performance is decreased and datacenters run at low efficiency. Most previous work on network scheduling lacks the knowledge of detailed requirements of jobs, hence the scheduling benefit is limited. In this paper, we try to develop a new scheduling algorithm which aims at reducing the job completion time (JCT). To achieve this goal, we take advantage of the directed acyclic graph (DAG) to build a novel network scheduler. The proposed scheduler formulates the problem as an integer linear programming (ILP) model, and proves it can be solved through an equivalent linear programming (LP) model quickly. Finally, experimental results demonstrate that our scheduler can return the solution in a few seconds and accelerate jobs significantly.
Quantum Differential Collision Key Recovery Attack of Multi-Round EM Structure
Zhang Zhongya, Wu Wenling, Zou Jian
2021, 58(12):  2811-2818.  doi:10.7544/issn1000-1239.2021.20200427
Asbtract ( 184 )   HTML ( 1)   PDF (547KB) ( 166 )  
Related Articles | Metrics
The development and application of quantum algorithms have exerted a profound influence on the design and analysis of cryptographic algorithms. Currently, the Grover search algorithm and Simon quantum period finding algorithm are the most widely used algorithms in the quantization of cryptographic analysis. However, as the quantization of birthday collision attack, BHT (Brassard, Hyer, Tapp) quantum collision search algorithm has not been applied in cryptanalysis. It is of great significance to study the BHT algorithm for the analysis and application of cryptographic algorithms. By analyzing the multi-round EM (Even, Mansour) structure, the combination of collision search algorithm and differential key recovery attack is studied under classical and quantum conditions, what is more the multi-round EM structure is attacked with differential collision key recovery, and the attack is quantified from the perspective of BHT algorithm. The results demonstrate that the time complexity of the differential key recovery attack on r-round EM structure decreases from O(2p+n) to O(2p+n/2) and the speed is 2n/2 times faster when the differential probability is 2-p≥2-n/2 as under classical conditions. In the quantum conditions, when the differential probability is 2-p>2-n/3, the time complexity of differential collision key recovery attack based on BHT collision search is better than that based on Grover search, which shows the effectiveness of BHT algorithm on specific cryptanalysis.