ISSN 1000-1239 CN 11-1777/TP

#### Table of Content

01 January 2023, Volume 60 Issue 1
Loongson Instruction Set Architecture Technology
Hu Weiwu, Wang Wenxiang, Wu Ruiyang, Wang Huandong, Zeng Lu, Xu Chenghua, Gao Xiang, Zhang Fuxin
2023, 60(1):  2-16.  doi:10.7544/issn1000-1239.202220196
Asbtract ( 169 )   PDF (3740KB) ( 147 )
Related Articles | Metrics
In this paper, the Loongson instruction set architecture (LoongArch) is introduced, which takes care of both advancement and software compatibility. LoongArch absorbs new features of recent ISA development to improve performance and reduce power consumption. New instructions, runtime environments, system states are added to LoongArch to accelerate binary translation from x86, ARM and MIPS binary code to LoongArch binary code. Binary translation systems are built on top of LoongArch to run MIPS Linux applications, x86 Linux and Windows applications, and ARM Android applications. LoongArch is implemented in the 3A5000 four-core CPU product of Loongson Technology Corporation Limited. Performance evaluation of SPEC CPU2006 with the 3A5000 and its FPGA system shows that, with the same micro-architecture, LoongArch performs on average 7% better than MIPS. With the hardware support, the binary translation from MIPS to LoongArch can be done without performance loss, and that from x86 to LoongArch performs 3.6(int) and 47.0(fp) times better than QEMU system. LoongArch has the potential to remove the barrier between different ISAs and provides a unified platform for a new eco-system.
Asynchronous Network-on-Chip Architecture for Neuromorphic Processor
Yang Zhijie, Wang Lei, Shi Wei, Peng Linghui, Wang Yao, Xu Weixia
2023, 60(1):  17-29.  doi:10.7544/issn1000-1239.202111032
Asbtract ( 88 )   PDF (3696KB) ( 82 )
Related Articles | Metrics
Neuromorphic processors show extremely high energy efficiency advantages over traditional deep learning processors. The network-on-chip with high scalability, high throughput, and high versatility features is generally adopted as the on-chip communication and connection implementation of neuromorphic processors. In order to solve the problems of making the synchronous network-on-chip that adopts the global clock tree to achieve timing closure, matching link delay in the asynchronous network-on-chip, and lacking electronic design automation tools in implementation and verification of asynchronous network-on-chip, we propose a low-power asynchronous network-on-chip architecture, NosralC, to build a global-asynchronous-local-synchronous multi-core neuromorphic processor. NosralC is implemented with asynchronous links and synchronous routers. The small amount of asynchronous design makes NosralC similar to the synchronous design and friendly to implementation and validation of asynchronous design using existing electronic design automation tools. Experiments show that compared with a synchronous counterpart baseline with the same function, NosralC achieves 37.5%?38.9% reduction in power consumption, 5.5%?8.0% reduction in average latency, and 36.9%?47.6% improvement in energy efficiency in executing the FSDD, DVS128 Gesture, NTI-DIGITS, and NMNIST neuromorphic application datasets while increasing less than 6% additional resource overhead and a small amount of performance overhead (0.8%?2.4% throughput decrease). NosralC is verified on the field programmable gate array (FPGA) platform and its implementability is proved.
A WCET Analysis Method for Multi-Core Processors with Multi-Tier Coherence Protocol
Zhu Yi’an, Shi Xianchen, Yao Ye, Li Lian, Ren Pengyuan, Dong Weizhen, Li Jiayu
2023, 60(1):  30-42.  doi:10.7544/issn1000-1239.202111244
Asbtract ( 59 )   PDF (4244KB) ( 70 )
Related Articles | Metrics
Due to the high parallel computing performance of multi-core processors, it has become a trend in real-time systems. Compared with single-core processors, the WCET (worst-case execution time) analysis of multi-core processors is confronted with greater challenges because of shared resources competition and parallel tasks interference. Especially, the Cache coherence protocol in multi-core processors makes WCET analysis more complex. We present a multi-tier coherence protocol WCET analysis method for multi-core processors with MESI coherence protocol based on the reasons above. Aiming at the architecture of multi-core processor with multi-tire coherence protocol, a multi-level consistency domain is defined which determines cores using the same coherence protocol. According to the access rules on memory hierarchy, the shared data access of multi-core processors is divided into intra-domain access and cross-domain access, proposing a Cache update function for multi-core processors with multi-tier coherence protocol. Thus, WCET analysis in the case of multi-tier coherence protocol nesting is realized. The experimental results show that the estimated results are consistent with the simulation results of GEM5 for different Cache configurations, and correlation analysis reveals that the estimated WCET is significantly correlated with simulation results. Furthermore, the average overestimation rate of this method is 1.30, which is decreased 0.78 than the representative related work.
Dataflow Architecture Optimization for Low-Precision Neural Networks
Fan Zhihua, Wu Xinxin, Li Wenming, Cao Huawei, An Xuejun, Ye Xiaochun, Fan Dongrui
2023, 60(1):  43-58.  doi:10.7544/issn1000-1239.202111275
Asbtract ( 105 )   PDF (4466KB) ( 116 )
Related Articles | Metrics
The execution model of the dataflow architecture is similar to the execution of neural network algorithm, which can exploit more parallelism. However, with the development of low-precision neural networks, the research on dataflow architecture has not been developed for low-precision neural networks. When low-precision (INT8, INT4 or lower) neural networks are deployed in traditional dataflow architectures, they will face the following three challenges: 1) The data path of the traditional dataflow architecture does not match the low-precision data, which cannot reflect the performance and energy efficiency advantages of the low-precision neural networks. 2) Vectorized low-precision data are required to be arranged in order in the on-chip memory, but these data are arranged in a scattered manner in the off-chip memory hierarchy, which makes data loading and writing back operations more complicated. The memory access components of the traditional dataflow architecture cannot support this complex memory access mode efficiently. 3) In traditional dataflow architecture, the double buffering mechanism is used to conceal the transmission delay. However, when low-precision data are transmitted, the utilization of the transmission bandwidth is significantly reduced, resulting in calculation delays that cannot cover the data transmission delay, and the double buffering mechanism faces the risk of failure, thereby affecting the performance and energy efficiency of the dataflow architecture. In order to solve the above problems, we optimize the dataflow architecture and design a low-precision neural networks accelerator named DPU_Q. First of all, a flexible and reconfigurable computing unit is designed, which dynamically reconstructs the data path according to the precision flag of the instruction. On the one hand, it can efficiently and flexibly support a variety of low-precision operations. On the other hand, the performance and throughput of the architecture can be further improved in this way. In addition, in order to solve the complex memory access mode of low-precision data, we design Scatter engine, which can splice and preprocess the low-precision data discretely distributed in the off-chip/low-level memory hierarchy to meet the format requirements of the on-chip/high-level memory hierarchy for data arrangement. At the same time, Scatter engine can effectively solve the problem of reduced bandwidth utilization when transmitting low-precision data. The transmission delay will not increase significantly, so it can be completely covered by the double buffer mechanism. Finally, a low-precision neural network scheduling method is proposed, which can fully reuse weights, activation values, reducing memory access overhead. Experiments show that compared with the same precision GPU (Titan Xp), state-of-the-art dataflow architecture (Eyeriss) and state-of-the-art low-precision neural network accelerator (BitFusion), DPU_Q achieves 3.18$\times$, 6.05$\times$, and 1.52$\times$ of performance improvement and 4.49$\times$, 1.6$\times$, and 1.13$\times$ of energy efficiency improvement, respectively.
Overview of the Frontier Progress of Causal Machine Learning
Li Jianing, Xiong Ruibin, Lan Yanyan, Pang Liang, Guo Jiafeng, Cheng Xueqi
2023, 60(1):  59-84.  doi:10.7544/issn1000-1239.202110780
Asbtract ( 380 )   PDF (4965KB) ( 402 )
Related Articles | Metrics
Machine learning is one of the important technical means to realize artificial intelligence, and it has important applications in the fields of computer vision, natural language processing, search engines and recommendation systems. Existing machine learning methods often focus on the correlations in the data and ignore the causality. With the increase in application requirements, their drawbacks have gradually begun to appear, facing a series of urgent problems in terms of interpretability, transferability, robustness, and fairness. In order to solve these problems, researchers have begun to re-examine the necessity of modeling causal relationship, and related methods have become one of the recent research hotspots. We organize and summarize the work of applying causal techniques and ideas to solve practical problems in the field of machine learning in recent years, and sort out the development venation of this emerging research direction. First, we briefly introduce the closely related causal theory to machine learning. Then, we classify and introduce each work based on the needs of different problems in machine learning, explain their differences and connections from the perspective of solution ideas and technical means. Finally, we summarize the current situation of causal machine learning, and make predictions and prospects for future development trends.
Prediction of the Positional Propensity Scores Based on Multi Task Learning
Cao Zelin, Xu Jun, Dong Zhenhua, Wen Jirong
2023, 60(1):  85-94.  doi:10.7544/issn1000-1239.202110853
Asbtract ( 65 )   PDF (2725KB) ( 66 )
Related Articles | Metrics
Users’ click data distribution during search is quite different in different search scenarios.The existing methods such as CPBM (contextual position based model) only predict the positional propensity score in multiple scenarios through single model, which inevitably reduces the prediction accuracy in different scenarios and affects the effect of removing position bias. In this work, A MCPBM (multi-gate contextual position based model) based on multi-task learning is proposed. In this model, the information filtering structure is added to CPBM model to solve the problem of poor prediction accuracy during joint training on multi-scene data. At the same time, in order to alleviate the problem that the convergence speed of different tasks is inconsistent. We propose an exponentially weighted average dynamic adjustment algorithm, which speeds up MCPBM training and improves the overall prediction performance of MCPBM. The experimental results show that MCPBM model proposed in this paper is better than traditional CPBM model in prediction accuracy when multi-scene data is jointly trained. After using MCPBM model to remove the position bias in the training data , the ranking model obtained by training on the generated unbiased data promotes the AvgRank ranking metric of test data by 1%–5%.
A Novel Encoding for Model-Based Diagnosis
Zhou Huisi, Ouyang Dantong, Tian Xinliang, Zhang Liming
2023, 60(1):  95-102.  doi:10.7544/issn1000-1239.202110794
Asbtract ( 49 )   PDF (3652KB) ( 42 )
Related Articles | Metrics
Model-based diagnosis (MBD), a well-known approach in the AI field, aims at identifying the root cause of a diagnosis problem. Since computing diagnosis is computationally challenging, some MBD algorithms by modifying the model encode are presented successively, such as Dominator-Oriented Encoding (DOE) approach. In this study, we propose a new encoding process, Observation-Oriented Encoding (OOE), which uses two ideas to simplify MBD model. Firstly, we consider more filtered edges based on observation of system and output of dominated components. This idea can reduce the number of encoded clauses for diagnosis system and observations. Secondly, more components are filtered by finding out observation-based filtered nodes. This approach reduces the number of encoded clauses for components. All of them can reduce the number of encoded clauses efficiently. Furthermore, experiment evaluations on ISCAS85 and ITC99 benchmarks, which contain well-known combinational circuits used for MBD algorithms, show that OOE approach generates less weighted conjunctive normal forms (WCNF) and makes diagnosis easier with maximum satisfiability (MaxSAT) solver, compared with DOE, the latest encoding algorithms for MBD, and Basic Encoding (BE), which is the traditional encoding approach for MBD. In addition, OOE approach returns a solution in a shorter time than DOE and BE approaches.
Semi-Supervised Classification Based on Transformed Learning
Kang Zhao, Liu Liang, Han Meng
2023, 60(1):  103-111.  doi:10.7544/issn1000-1239.202110811
Asbtract ( 36 )   PDF (6143KB) ( 41 )
Related Articles | Metrics
In recent years graph-based semi-supervised classification is one of the research hot topics in machine learning and pattern recognition. In general, this algorithm discovers the hidden information by constructing a graph and classifies the labels for unlabeled samples based on the structural information of the graph. Therefore, the performance of semi-supervised classification heavily depends on the quality of the graph, especially the graph construction algorithm and the quality of data. In order to solve the above problems, we propose to perform a semi-supervised classification based on transformed learning (TLSSC) in this paper. Unlike most existing semi-supervised classification algorithms that learn the graph using raw features, our algorithm seeks a representation (transformed coefficients) and performs graph learning and label propagation based on the learned representation. In particular, a unified framework that integrates representation learning, graph construction, and label propagation is proposed, so that it is alternately updated and mutually improved and can avoid the sub-optimal solution caused by the low-quality graph. Specially, the raw features are mapped into transformed representation by transformed learning, then learn a high-quality graph by self-expression and achieve classification performance by label propagation. Extensive experiments on face and subject data sets show that our proposed algorithm outperforms other state-of-the-art algorithms in most cases.
A Time and Relation-Aware Graph Collaborative Filtering for Cross-Domain Sequential Recommendation
Ren Hao, Liu Baisong, Sun Jinyang, Dong Qian, Qian Jiangbo
2023, 60(1):  112-124.  doi:10.7544/issn1000-1239.202110545
Asbtract ( 70 )   PDF (3158KB) ( 60 )
Related Articles | Metrics
Cross-domain sequential recommendation aims to mine a given user’s preferences from the historical interaction sequences in different domains and to predict the next item that the user is most likely to interact with among multiple domains, further to mitigate the impact of data sparsity on the capture and prediction for users’ intents. Inspired by the idea of collaborative filtering, a time and relation-aware graph collaborative filtering for cross-domain sequential recommendation (TRaGCF) algorithm is proposed to solve the problem of data sparsity by uncovering users’ high-order behavior patterns as well as utilizing the characteristics of bi-directional migration of user behavior patterns across domains. Firstly, we propose a time-aware graph attention (Ta-GAT) mechanism to obtain the cross-domain sequence-level item representation. Then, a user-item interaction bipartite graph in the domain is used to mine users’ preferences, and a relation-aware graph attention (Ta-GAT) mechanism is proposed to learn item collaborative representation and user collaborative representation, which creates the foundation for cross-domain transfer of user preferences. Finally, to simultaneously improve the recommendation results in both domains, a user preference feature bi-directional transfer module (PBT) is proposed, transferring shared user preferences across domains and retaining specific preferences within one domain. The accuracy and effectiveness of our model are validated by two experimental datasets, Amazon Movie-Book and Food-Kitchen. The experimental results have demonstrated the necessity of considering intricate correlations between items in a cross-domain sequential recommendation scenario for mining users’ intents, and the results also prove the importance of preserving users’ specific preferences in creating a comprehensive user portrait when transferring users’ preferences across domains.
Time Series Anomaly Pattern Recognition Based on Adaptive k Nearest Neighbor
Wang Ling, Zhou Nan, Shen Peng
2023, 60(1):  125-139.  doi:10.7544/issn1000-1239.202111062
Asbtract ( 48 )   PDF (4439KB) ( 62 )
Related Articles | Metrics
As a typical representative of data, time series is widely used in many research fields. The time series anomaly pattern represents the emergence of a special situation, and is of great significance in many fields. Most of the existing time series anomaly pattern recognition algorithms simply detect anomaly subsequences, ignoring the problem of distinguishing the types of anomaly subsequences, and many parameters need to be set manually. In this paper, an anomaly pattern recognition algorithm based on adaptive k nearest neighbor(APAKN) is proposed. Firstly, the adaptive neighbor value k of each subsequence is determined, and an adaptive distance ratio is introduced to calculate the relative density of the subsequence to determine the anomaly score. Then, an adaptive threshold method based on minimum variance is proposed to determine the anomaly threshold and detect all anomaly subsequences. Finally, the anomaly subsequences are clustered, and the obtained cluster centers are anomaly patterns with different changing trends. The whole algorithm process not only solves the density imbalance problem without setting any parameters, but also simplifies the steps of the traditional density-based anomaly subsequence detection algorithm to achieve a good anomaly pattern recognition effect. Experimental results on the 10 data sets of UCR show that the proposed algorithm performs well in detecting anomaly subsequences and clustering anomaly subsequences without setting parameters.
Knowledge-Enhanced Graph Encoding Method for Metaphor Detection in Text
Huang Heyan, Liu Xiao, Liu Qian
2023, 60(1):  140-152.  doi:10.7544/issn1000-1239.202110927
Asbtract ( 44 )   PDF (2159KB) ( 28 )
Related Articles | Metrics
Metaphor recognition is one of the essential tasks of semantic understanding in natural language processing, aiming to identify whether one concept is viewed in terms of the properties and characteristics of the other. Since pure neural network methods are restricted by the scale of datasets and the sparsity of human annotations, recent researchers working on metaphor recognition explore how to combine the knowledge in other tasks and coarse-grained syntactic knowledge with neural network models, obtaining more effective feature vectors for sequence coding and modeling in text. However, the existing methods ignore the word sense knowledge and fine-grained syntactic knowledge, resulting in the problem of low utilization of external knowledge and the difficulty to model complex context. Aiming at the above issues, a knowledge-enhanced graph encoding method (KEG) for metaphor detection in text is proposed. This method consists of three parts. In the encoding layer, the sense vector is trained using the word sense knowledge, combined with the context vector generated by the pre-training model to enhance the semantic representation. In the graph layer, the information graph is constructed using fine-grained syntactic knowledge, and then the fine-grained context is calculated. The layer is combined with the graph recurrent neural network, whose state transition is carried out iteratively to obtain the node vector and the global vector representing the word and the sentence, respectively, to realize the efficient modeling of the complex context. In the decoding layer, conditional random fields are used to decode the sequence tags following the sequence labeling architecture. Experimental results show that this method effectively improves the performance on four international public datasets.
Graph Convolution-Enhanced Multi-Channel Decoding Joint Entity and Relation Extraction Model
Qiao Yongpeng, Yu Yaxin, Liu Shuyue, Wang Ziteng, Xia Zifang, Qiao Jiaqi
2023, 60(1):  153-166.  doi:10.7544/issn1000-1239.202110767
Asbtract ( 37 )   PDF (3872KB) ( 29 )
Related Articles | Metrics
Extracting relational triplets from unstructured natural language texts are the most critical step in building a large-scale knowledge graph, but existing researches still have the following problems: 1) Existing models ignore the problem of relation overlapping caused by multiple triplets sharing the same entity in text; 2) The current joint extraction model based on encoder-decoder does not fully consider the dependency relationship among words in the text; 3) The excessively long sequence of triplets leads to the accumulation and propagation of errors, which affects the precision and efficiency of relation extraction in entity. Based on this, a graph convolution-enhanced multi-channel decoding joint entity and relation extraction model (GMCD-JERE) is proposed. First, the BiLSTM is introduced as a model encoder to strengthen the two-way feature fusion of words in the text; second, the dependency relationship between the words in the sentence is merged through the graph convolution multi-hop mechanism to improve the accuracy of relation classification; third, through multi-channel decoding mechanism, the model solves the problem of relation overlapping, and alleviates the effect of error accumulation and propagation at the same time; fourth, the experiment selects the current three mainstream models for performance verification, and the results on the NYT (New York times) dataset show that the accuracy rate, recall rate, and F1 are increased by 4.3%, 5.1% and 4.8%. Also, the extraction order starting with the relation is verified in the WebNLG (Web natural language generation) dataset.
Private Protocol Reverse Engineering Based on Network Traffic: A Survey
Li Junchen, Cheng Guang, Yang Gangqin
2023, 60(1):  167-190.  doi:10.7544/issn1000-1239.202110722
Asbtract ( 87 )   PDF (2079KB) ( 73 )
Related Articles | Metrics
Protocol reverse engineering is an important way to analyze private protocols, which can infer the protocol constraints and specifications with little or no prior knowledge, so protocol reverse engineering has practical value in malware supervision, protocol fuzz testing and vulnerability detection, interaction behavior understanding and so on. Network traffic characterizes protocol specifications and bears the inherent characteristics of protocol, so that the private protocol reverse engineering based on network traffic is more suitable for discovering, analyzing and monitoring the private protocol on the network. In this paper, we provide a thorough review of the existing private protocol reverse engineering based on network traffic: Firstly, the architecture of private protocol reverse engineering based on network traffic is proposed, which includes four steps of pre-inference, protocol format inference, semantic analysis, and protocol state machine inference. The main research tasks of each step are also elaborated and a classification structure oriented to the core of the research method is proposed. Secondly, the method and process of each private protocol reverse engineering are described in detail, and a comparative analysis from multiple perspectives of applicable protocol type, technology kernel, and inference algorithms etc is made. A systematic overview of existing private protocol reverse engineering based on network traffic is conducted. Finally, the shortcomings of existing research and main influencing factors are summarized, and the future research direction and application scenarios of private protocol reverse engineering are prospected.
Research on Cross Technology Communication
Guo Xiuzhen, He Yuan
2023, 60(1):  191-205.  doi:10.7544/issn1000-1239.202110441
Asbtract ( 39 )   PDF (2736KB) ( 24 )
Related Articles | Metrics
The ever-developing Internet of things (IoT) brings the prosperity of wireless sensing and control application. In many scenarios, different wireless technologies coexist in the shared frequency medium as well as the physical space. Such wireless coexistence may lead to serious cross technology interference (CTI) problems, e.g. channel competition, signal collision, throughput degradation. Compared with traditional methods like interference avoidance, tolerance and concurrency mechanism, directly and timely information exchange among heterogeneous devices is therefore a fundamental requirement to ensure the usability, inter-operability and reliability of the IoT. Under this circumstance, cross technology communication (CTC) method thus becomes a hot topic in both academic and industrial field, which aims at directly exchanging data among heterogeneous devices that follow different standards. Most of existing research works focus on the enabling technology of CTC, but lack of thinking and summary of CTC methods. Based on the survey of recent studies in CTC method, we first analyze the background and significance of CTC method. We category existing methods as two classes including packet-level CTC and physical-level CTC, and introduce the application scenarios of CTC method. The potential research directions in this area are further discussed, which is promising to achieve cross-networks, cross-frequency, and cross-medium connections.
Cache Side-Channel Attacks and Defenses
Zhang Weijuan, Bai Lu, Ling Yuqing, Lan Xiao, Jia Xiaoqi
2023, 60(1):  206-222.  doi:10.7544/issn1000-1239.202110774
Asbtract ( 45 )   PDF (2875KB) ( 46 )
Related Articles | Metrics
In recent years, with the development of information technology, cache side-channel attack threats in information system has a rapid growth. It has taken more than 10 years for cache side channel attacks to evolve and develop since cache-timing analysis was proposed to speculate encryption keys. In this survey, we comb the cache side-channel attack threats in the information system by analyzing the vulnerabilities in the design characteristics of software and hardware. Then we summarize the attacks from attack scene, cache levels, attack targets and principles. Further more, we compare the attack conditions, advantages and disadvantages of 7 typical cache side-channel attacks in order to better understand their principles and applications. We also make a systematic analysis of the defense technology against cache side channel attack from detection stage and prevention stage, classify and analyze the defence technology based on different defense principles. Finally, we summarize the work of this paper, discuss the research hotspots and the development trend of cache side-channel attack and defense under the Internet ecosystem, and point out the future research direction of cache side-channel attack and defense, so as to provide reference for researchers who want to start research in this field.