ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 September 2022, Volume 59 Issue 9
Tracking and Querying over Timeseries Data with Schema Evolution
Zhao Xin, Wan Yingge, Liu Yingbo
2022, 59(9):  1869-1886.  doi:10.7544/issn1000-1239.20220012
Asbtract ( 104 )   PDF (2822KB) ( 139 )  
Related Articles | Metrics
In the context of the Internet of things and big data, vast amount of sensors generate massive time series data on daily basis. The fast iterations of software releases lead to frequent changes to the schema of these time series, which makes the management of schema evolution of time series increasingly prominent. Schema evolution requires the management of each version of data schema, so that there is no information loss during schema modification, and data can be accessed across multiple schema versions. Existing timeseries databases management system have limited support for schema evolution, while schema evolution may occur frequently under this circumstance. State-of-art research and technology for schema evolution mainly focus on relational database, struggling with complicated integrity constraint which is more flexible within timeseries database. This paper compares various databases with regard to schema evolution, provide a formal definition to the time series and its schemas, and analyzes the process of schema evolution. This paper designs a data-centric schema evolution tracing and querying system, discusses the key problems of schema tracking and cross schema version query in detail, and implements and tests it on the timeseries database Apache IoTDB. Finally, the performance of the system is evaluated, and the future research is discussed.
An Exploratory Adaptive FSM Test Method of Intelligent Service Terminal
Nie Yuge, Yin Beibei, Pei Hanyu, Li Li, Xu Lixin
2022, 59(9):  1887-1901.  doi:10.7544/issn1000-1239.20220023
Asbtract ( 65 )   PDF (1547KB) ( 71 )  
Related Articles | Metrics
With the advent of the intelligent era, intelligent service terminals like automatic beverage vending machines, automatic subway ticketing machines and ATM machines have played an increasingly important role in our lives. Therefore, it is essential to make a comprehensive and effective test to prevent various possible errors and improve the user experience. In view of the problems such as the workload of testing is huge and difficult to be standardized caused by frequent software version updates, difficult connection between development and testing, and testing while developing, based on the characteristics of intelligent service terminal that they have obvious states and state migrations, we put forward an efficient test scheme which can still be used efficiently in the case of absence of detailed specifications or the rapid software iteration requiring continuous regression testing—exploratory adaptive finite state machine (FSM) testing. Firstly, the state and migration information of the system to be tested are obtained through exploratory testing, and then they are modeled as FSM. According to the model and the executed test cases, the test cases are generated based on the state and state migration coverage, and the test model and corresponding test cases are continuously adjusted adaptively in the testing process. Based on this method, an experimental platform is built by integrating the open source software Graphwalker. Ten different kinds of common intelligent service terminals are selected to evaluate their effectiveness through experiments. The experimental results show that the number of test cases generated by this method is small and the degree of test adequacy is high. It can efficiently find the defects and problems in the intelligent service terminal system.
A Networked Software Optimization Mechanism Based on Gradient-Play
Shu Chang, Li Qingshan, Wang Lu, Wang Ziqi, Ji Yajiang
2022, 59(9):  1902-1913.  doi:10.7544/issn1000-1239.20220016
Asbtract ( 64 )   PDF (2341KB) ( 63 )  
Related Articles | Metrics
Networked software is a novel type of system deploying services on different devices and running based on the Internet. In order to improve service efficiency and realize a greater variety of functions, more software developers prefer to build systems in this way. However, the highly distributed characteristic brings obstacles to optimization of this kind of software. This paper is aimed at solving the optimization decision issues of networked software based on game theory. We let each software node exchange information with other nodes connecting to them and adjust their states for better payoffs, to achieve the purpose of improving overall system performance. In this process, we apply a consensus-based method to overcome the communication problems used to exist in the networked software system. With the method, each software node can make optimization decisions via incomplete system information. In addition, we propose an adaptive step size mechanism and a forced coordination mechanism to adjust parameters reasonably. These two mechanisms alleviate the problem of divergence and reduce the difficulty of parameter selection in this kind of methods, after that, an efficient synergy between state optimization and coordination of nodes can be realized. The experiments show that the original method can converge to Nash equilibrium more efficiently with these two mechanisms proposed by us.
Deep Learning Based Data Race Detection Approach
Zhang Yang, Qiao Liu, Dong Chunhao, Gao Hongbin
2022, 59(9):  1914-1928.  doi:10.7544/issn1000-1239.20220014
Asbtract ( 136 )   PDF (2582KB) ( 105 )  
Related Articles | Metrics
Existing approaches for deep-learning-based data race detection are suffering from the issues of single feature extraction and low accuracy. To improve the state-of-the-art, a novel approach called DeleRace is proposed to detect data race based on deep learning model. Firstly, DeleRace extracts instruction-level, method-level, and file-level features from a variety of real-world applications based on static analysis tool WALA. All these features are transformed by word vectorization to build the training dataset. Secondly, ConRacer, as an existing data race tool, is employed to identify the real race. Based on this tool, those positive samples in the training dataset is labelled. To further optimize the dataset, DeleRace leverages SMOTE algorithm to distribute both positive samples and negative ones in balance. Finally, CNN-LSTM model is constructed and a classifier is trained to detect data race. In the experimentation, a total of 26 real-world applications is selected from different fields in DaCapo, JGF, IBM Contest and PJBench benchmark suites. The experimental results show that the accuracy of DeleRace is 96.79% which is 4.65% higher than existing deep-learning-based approaches. Furthermore, the performance of DeleRace is compared with that of both dynamic tools (such as Said and RVPredict) and static tools (such as SRD and ConRacer), which demonstrates the effectiveness of DeleRace.
Evaluating the Fitness of Model Deviation Detection Approaches on Self-Adaptive Software Systems
Tong Yanxiang, Qin Yi, Ma Xiaoxing
2022, 59(9):  1929-1946.  doi:10.7544/issn1000-1239.20220015
Asbtract ( 35 )   PDF (1056KB) ( 39 )  
Related Articles | Metrics
Model deviations in self-adaptive software systems cause critical reliability issues. For control-based self-adaptive systems, model deviation roots in the drifting of the managed system’s nominal model in uncertain running environments, which causes the invalidation of provided formal guarantees, and may lead to system’s abnormal behavior. Existing model deviation detection approaches often ignore the characteristics of model deviations that emerge in different scenarios. This makes it difficult for users to choose an appropriate approach in a specific application scenario. We provide a framework to describe different detection approaches and propose three metrics to evaluate a detection approach’s fitness with respect to different types of model deviations. The provided framework is composed of four parts, namely system modelling, detection variable estimation, model deviation representation, and model deviation judgement, based on the process of model deviation detection. The proposed metrics, including control-signal-intensity, environmental-input-intensity, and uncertainty-intensity, concern three key factors in the process of model deviation detection, respectively. Using these metrics, a deviation scenario is quantified with a vector and is classified by the quantified values into a characteristic scenario according to control theory. A number of experiments are conducted to study the effectiveness of four mainstream model detection approaches in different scenarios, and their fitness to different characteristic scenarios of model deviations is summarized.
Survey on Key Technologies of New Generation Knowledge Graph
Wang Meng, Wang Haofen, Li Bohan, Zhao Xiang, Wang Xin
2022, 59(9):  1947-1965.  doi:10.7544/issn1000-1239.20210829
Asbtract ( 837 )   PDF (3238KB) ( 824 )  
Related Articles | Metrics
With the wave of the past decade, the development of artificial intelligence is in the critical period from perceptual intelligence to cognitive intelligence. Knowledge graph, as the core technique of knowledge engineering in the era of big data, is the combination of symbolism and connectionism, and is the cornerstone of realizing cognitive intelligence. It provides an effective solution for the knowledge organization and intelligent application in the Internet era. In recent years, some progress has been made in the key technologies and theories of knowledge graph, and typical applications of knowledge graph based on information system have gradually entered various industries, including intelligent question answering, recommendation system, personal assistant, etc. However, in the context of big data environment and new infrastructure of China, the increasing multi-modal data and new interaction ways have raised new demands and brought new challenges to the new generation of knowledge graph in terms of basic theory, architecture, and key technologies. We summarize the research and development status of key technologies of the new generation knowledge graph at home and abroad, including unstructured multi-modal data organization and understanding, large-scale dynamic knowledge graph representation learning and pre-training models, and neural-symbolic knowledge inference. We summarize, compare and analyze the latest research progress. Finally, the future technical challenges and research directions are prospected.
A Representation Learning Method of Knowledge Graph Integrating Relation Path and Entity Description Information
Ning Yuanlong, Zhou Gang, Lu Jicang, Yang Dawei, Zhang Tian
2022, 59(9):  1966-1979.  doi:10.7544/issn1000-1239.20210651
Asbtract ( 58 )   PDF (2012KB) ( 60 )  
Related Articles | Metrics
The representation learning of knowledge graph aims to map entities and relationships of knowledge graph into a continuous low-dimensional vector space through the learning method to obtain its vector representation. Most existing knowledge graph representation learning methods only consider the single-step relationship between entities from the perspective of triples, and fail to effectively use important information such as multi-step relationship paths and entity descriptions, which affects performance. In response to the above problems, we propose a knowledge graph representation learning model(PDRL) that integrates relationship paths and entity descriptions. Firstly, it is to perform a joint representation on the multi-step relationship path in the knowledge graph, and obtain the representation of the relationship path information by adding all the relationships and entities on the path; secondly, use BERT model to encode entity description information to obtain the corresponding semantic representation; finally, the fusion training is performed on the triples in the knowledge graph, the semantic representation of entity description and the representation of the relationship path to obtain the fusion vector representation. On the FB15K, WN18, FB15K-237 and WN18RR data sets, the proposed model and the benchmark model are used to perform link prediction and triple classification tasks. The experimental results show that compared with the existing benchmark models, the model in this paper has higher performance in two tasks, which proves the effectiveness and superiority of this method.
An Alleviate Exposure Bias Method in Joint Extraction of Entities and Relations
Wang Zhen, Fan Hongjie, Liu Junfei
2022, 59(9):  1980-1992.  doi:10.7544/issn1000-1239.20210078
Asbtract ( 54 )   PDF (1078KB) ( 35 )  
Related Articles | Metrics
Joint extraction of entities and relations aims to discover entity mentions and relational facts simultaneously from unstructured texts, which is a critical step in knowledge graph construction, and serves as a basis of many high-level tasks in natural language processing. The joint extraction model gets more widespread attention as they can model the correlation between entity recognition and relation extraction more effectively. Most of the existing work uses a phased joint extraction method to deal with the problem of triple extraction in the text where there are multiple triples and entities overlapping at the same time, although reasonable performance improvement has been achieved, there are serious exposure bias problems. In this paper, we propose a novel method called fusional relation expression embedding (FREE) to tackle the exposure bias problem by fusing relation expression information. Besides, a novel feature fusion layer called conditional layer normalization is proposed to fuse prior information more effectively. We conduct a lot of comparative experiments on two widely used data sets. The in-depth analysis of the experimental results shows that the proposed method has significant advantages over the current state-of-the-art baseline model, and it can deal with various situations more effectively and achieve the competitive performance as the current advanced model for exposure bias problems without sacrificing efficiency.
Cross-Modal Retrieval with Correlation Feature Propagation
Zhang Lu, Cao Feng, Liang Xinyan, Qian Yuhua
2022, 59(9):  1993-2002.  doi:10.7544/issn1000-1239.20210475
Asbtract ( 45 )   PDF (1711KB) ( 28 )  
Related Articles | Metrics
With the rapid development of deep learning and the deep research of correlation learning, the performance of cross-modal retrieval has been greatly improved. The challenge of cross-modal retrieval research is that different modal data are related in high-level semantics, but there is a heterogeneous gap in low-level features. The existing methods mainly map the features of different modalities to feature space with certain correlation by single correlation constraint to solve the heterogeneous gap problem of the low-level features. However, representation learning shows that different layers of features can help improve the final performance of the model. Therefore, the correlation of the single feature space learned by existing methods is weak, namely the feature space may not be the optimal retrieval space. In order to solve this problem, we propose the modal of cross-modal retrieval with correlation feature propagation. Its basic idea is to strengthen the correlation between the layers of the deep network, namely the characteristics of the former layer with certain correlation are transmitted to the latter layer through nonlinear changes, which is more conducive to find the feature space that makes the two modalities more correlated. A lot of experiments on Wikipedia, Pascal data sets show that this method can improve mean average precision.
Weakly-Supervised Contrastive Learning Framework for Few-Shot Sentiment Classification Tasks
Lu Shaoshuai, Chen Long, Lu Guangyue, Guan Ziyu, Xie Fei
2022, 59(9):  2003-2014.  doi:10.7544/issn1000-1239.20210699
Asbtract ( 68 )   PDF (1428KB) ( 67 )  
Related Articles | Metrics
Text sentiment classification is a challenge research topic in natural language processing. Lexicon-based methods and traditional machine learning-based methods rely on high-quality sentiment lexicon and robust feature engineering respectively, whereas most deep learning methods are heavily reliant on large human-annotated data sets. Fortunately, users on various social platforms generate massive amounts of tagged opinioned texts which can be deemed as weakly-labeled data for sentiment classification. However, noisy labeled instances in weakly-labeled data have a negative impact on the training phase. In this paper, we present a weakly-supervised contrastive learning framework for few-shot sentiment classification that learns the sentiment semantics from large user-tagged data with noisy labels while also exploiting inter-class contrastive patterns hidden in small labeled data. The framework consists of two steps: first, we design a weakly-supervised pre-training strategy to reduce the influence of the noisy labeled samples, and then the contrastive strategy is used in supervised fine-tuning to capture the contrast patterns in the small labeled data. The experimental results on Amazon review data set show that our approach outperforms the other baseline methods. When fine-tuned on only 0.5% (i.e. 32 samples) of the labels, we achieve comparable performance among the deep baselines, showing its robustness in the data sparsity scenario.
Cross-Domain Trust Prediction Based on tri-training and Extreme Learning Machine
Wang Yan Tong Xiangrong
2022, 59(9):  2015-2026.  doi:10.7544/issn1000-1239.20210467
Asbtract ( 35 )   PDF (1773KB) ( 24 )  
Related Articles | Metrics
Trust prediction is often used in recommendation systems and trading platforms. Most scholars study trust prediction based on one network. At present, most networks lack tags. Therefore, it is necessary to predict the social relationship of another network through one network. There are two problems in the method of using BP neural network combined with asymmetric tri-training to build a model. The first problem is that it takes a long time for BP neural network to backpropagate the adjustment error, and the second problem is that the model has only two classifiers to generate pseudo labels, which requires an expert threshold. To solve the structure and speed of the model, an improved cross-domain trust prediction model based on tri-training and extreme learning machine is proposed, which combines the tri-training model and asymmetric tri-training model to perform similar transfer learning methods to predict the network. The classifier of the model uses extreme learning machine with a faster speed, the tri-training model to generate pseudo-labels, and a “minority obeys majority” voting mechanism. Experiments test the effect of whether to add special features, and compare the algorithm with other existing algorithms on six data sets. Experiments show that the model is superior to other algorithms in terms of recall and stability.
Optimal Scale Selection for Generalized Multi-Scale Set-Valued Decision Systems
Hu Jun, Chen Yan, Zhang Qinghua, Wang Guoyin
2022, 59(9):  2027-2038.  doi:10.7544/issn1000-1239.20210196
Asbtract ( 25 )   PDF (961KB) ( 36 )  
Related Articles | Metrics
In order to observe, represent, analyze and make decisions on the same object at different granularity, multi-scale information system is proposed. Considering that the value of an object in each scale of attribute is multiple, multi-scale information system is further extended to multi-scale set-valued information system. However, existing researches on multi-scale set-valued information systems assume that all attributes must have the same number of scales, and this assumption makes all attributes can only be combined at the same scale. Moreover, the optimal scale only considers the consistency or uncertainty of the decision system, and ignores the cost of practical application. To solve the above problems, a generalized multi-scale set-valued decision system with cost is defined, and the variation trend of uncertainty and the cost of decision system with different scale combinations is analyzed. Then, in order to improve the time efficiency, a scale space updating method based on three-way decisions is proposed. Finally, an optimal scale selection method is proposed to minimize the uncertainty and cost based on users’ requirement. The experimental results show that the proposed method can not only obtain the optimal scale by combining the uncertainty and cost, but also effectively improve the computational efficiency compared with the method of lattic mode (LM).
An Approach for Training Moral Agents via Reinforcement Learning
Gu Tianlong, Gao Hui, Li Long, Bao Xuguang, Li Yunhui
2022, 59(9):  2039-2050.  doi:10.7544/issn1000-1239.20210474
Asbtract ( 48 )   PDF (1580KB) ( 53 )  
Related Articles | Metrics
Artificial agents such as autonomous vehicles and healthcare robots are playing an increasingly important role in human life, and their moral issues have attracted more and more concerns. To build the ability for agents to comply with basic human ethical norms, a novel approach for training artificial moral agents is proposed based on crowdsourcing and reinforcement learning. Firstly, crowdsourcing is used to obtain sampling data sets of human behaviors, and text clustering and association analysis are used to generate plot graphs and trajectory trees, which define a basic behavior space of agents and present the sequence of behaviors. Secondly, the concept of meta-ethical behavior is proposed, which expands the behavior space of agents by summarizing similar behaviors in different scenarios, and nine kinds of meta-ethical behaviors are extracted from the Code of Daily Behavior of Middle School Students. Finally, a behavior grading mechanism and the corresponding reward and punishment function in reinforcement learning are proposed. By simulating drug purchase scenarios in human life, Q-learning algorithm and DQN (deep Q-networks) algorithm are used to complete the training experiments of moral agent respectively. Experimental results show that the trained agents can complete the expected tasks in ethical manners, which verifies the rationality and effectiveness of the above method.
Protein-Drug Interaction Prediction Based on Attention Feature Fusion
Hua Yang, Li Jinxing, Feng Zhenhua, Song Xiaoning, Sun Jun, Yu Dongjun
2022, 59(9):  2051-2065.  doi:10.7544/issn1000-1239.20210134
Asbtract ( 40 )   PDF (3194KB) ( 41 )  
Related Articles | Metrics
Drugs usually work by inhibiting or activating the active reactions of certain proteins in the human body, so the prediction of the interactions between proteins and drugs is very important for the screening of new drugs. However, it takes a lot of manpower and material resources to carry out this kind of wet experiment using traditional methods. To resolve this problem, we propose a protein-drug interaction prediction algorithm based on the self-attention mechanism and multi-drug feature fusion. Firstly, the Morgan fingerprint based on drug molecular structure characteristics, the Mol2Vec representation vector, and the features extracted by the messaging network are reasonably fused. Secondly, the fusion results are used to weigh the protein features extracted by dense convolution. After that, the self-attentional mechanism and bidirectional gating circulatory unit are used to predict protein-drug interactions by combining their characteristics. Finally, an applicable prediction system based on the training model is designed, which demonstrates the specific use cases and effects of the proposed method in drug screening for the Alzheimer disease. The experimental results show that the proposed algorithm achieves better prediction performance on BindingDB, Kinase, Human and C.elegans datasets compared with the existing prediction methods. The AUC values achieve 0.963, 0.937, 0.983, 0.990 on the four datasets, demonstrating significant superiority over the other algorithms.
Parameterized Fuzzy Decision Implication
Wang Qi, Li Deyu, Zhai Yanhui, Zhang Shaoxia
2022, 59(9):  2066-2074.  doi:10.7544/issn1000-1239.20210539
Asbtract ( 35 )   PDF (585KB) ( 37 )  
Related Articles | Metrics
Intelligent decision making is an important part of artificial intelligence. In formal concept analysis, decision is represented by decision implication in decision context, while fuzzy decision implication is based on fuzzy decision context, whose premise and conclusion are condition attributes and decision attributes respectively. Fuzzy decision implication exhibits a wider application significance, because it can avoid the fuzzy attribute implications that occur between condition attributes and between decision attributes. Deterministic, unadjustable knowledge acquisition is poorly adaptive to practical applications. Therefore, we need to expand unadjustable knowledge discovery methods to parameterized adjustable knowledge discovery methods. As two kinds of parameterized strategies, hedges and thresholds play an important role in the research of formal concept analysis with fuzzy attributes. Existing fuzzy decision implication models only take into account the hedge operator, which is poor in tunability, and the parameterized strategy that considers the threshold is less studied. Thus, in our paper, complete residual lattice is used as the reference frame, and the two parameterized strategies of hedge and threshold are introduced into fuzzy decision implication. We study the semantic aspect and prove some basic properties. Then we put forward three inference rules for parameterized fuzzy decision implication based knowledge reasoning and show their rationality and completeness. According to the results obtained, in real life, users may choose appropriate hedge and threshold to acquire knowledge, thus enhancing the tunability and application value of fuzzy decision implication.
Node Localization Protocol with Adjustable Privacy Protection Capability
Chen Yan, Gao Zhenguo, Wang Haijun, Ouyang Yun, Gou Jin
2022, 59(9):  2075-2088.  doi:10.7544/issn1000-1239.20210009
Asbtract ( 51 )   PDF (2132KB) ( 32 )  
Related Articles | Metrics
Privacy-preserving summation (PPS) is a competent node positioning technique with privacy protection capability. However, the traditional PPS requires all participating nodes to generate and transmit a set of random interference matrices, which results in excessive network traffic. To address this issue, we propose the Privacy-preserving summation with k (PPS-k). The PPS-k randomly designates k nodes to generate and transmit random interference matrices. The generation process of the interference matrices can be changed by adjusting the value of k, which makes it more flexible than PPS. The node positioning network is composed of several static anchors that know their own positions. The anchors can communicate with each other and send measurements to the target, to help the target positioning. We define different scenarios according to where the measurements are stored and design PPS-k-based node localization protocols for different scenarios. We also propose a notion that uses the ratio of the number of extra equations to the number of unknown scalars as an indicator to evaluate the privacy protection capability of PPS based technique. Compared with the traditional evaluation criteria, the privacy protection rate eliminates the influence of the dimension of privacy information on the evaluation result when evaluating algorithms privacy protection performance. The simulation results validate the efficiency of the proposed methods with PPS-k in adjusting traffic and privacy protection capability.
Reversible Data Hiding in Encrypted Images Based on Pixel Prediction and Block Labeling
She Xiaomeng, Du Yang, Ma Wenjing, Yin Zhaoxia
2022, 59(9):  2089-2100.  doi:10.7544/issn1000-1239.20210495
Asbtract ( 53 )   PDF (2915KB) ( 44 )  
Related Articles | Metrics
Reversible data hiding in encrypted images (RDHEI) is an effective technology that can embed additional data after image encryption, extract data error-free and recover images losslessly. It can not only achieve information transmission, but also ensure the security of the transmission carrier. Therefore, with the development of cloud computing and the growing demand for privacy protection, RDHEI has been widely concerned in recent years. In this paper, a RDHEI algorithm based on pixel prediction and block labeling is proposed. The proposed algorithm focuses on how to achieve high embedding capacity and complete reversibility while ensuring security. In the preprocessing step, the median edge predictor is used to calculate the prediction error. Different from the previous algorithms, the proposed algorithm uses the most significant bit to represent the sign of the prediction errors, and the rest of bit planes represent the absolute values of the prediction errors. The prediction error bit planes are divided into several non-overlapping blocks and these blocks are labeled adaptively. The label map with sparse feature can be compressed effectively via arithmetic coding. In the encryption step, the encryption key is used to generate a pseudo-random matrix to encrypt the original image. In the data hiding step, different methods are used to embed additional data for different types of blocks. Finally, according to the corresponding key, the additional data can be extracted error-free and the original image can be recovered losslessly. Experimental results show that, the proposed algorithm can not only ensure the security and reversibility, but also improve the embedding capacity significantly compared with the state-of-art RDHEI algorithms.