2018 Vol. 55 No. 11
2018, 55(11): 2343-2360.
DOI: 10.7544/issn1000-1239.2018.20170629
Abstract:
As an unprecedented breakthrough in experimental molecular biology domain, DNA microarray enables simultaneously monitoring of the expression level of thousands of genes over many experimental conditions. Studies have shown that analyzing microarray data is essential for finding gene co-expression network, designing new types of drugs, preventing disease, and so on. To analyze gene expression datasets, the researchers design many clustering methods, which can only find fewer of useful knowledge. Due to a subset of genes co-regulate and co-express only under a subset of experimental conditions, and also not co-express at the same level, they can belong to several genetic pathways that are not apparent. In this situation, the biclustering method is proposed. At the same time, the direction of gene expression analysis changes from the whole pattern mining to the local pattern discovery, and then it changes the situation of clustering data only based on all the objects or attributes of the data. The paper introduces the state-of-the-art progress, which includes the definition of local pattern, the types and criteria of local pattern, mining and query methods of local pattern. Then it concludes the mining criteria based on quantity and quality, and related software. Further, it gives the problems in the existing algorithms and tools. Finally, we discuss the research direction in the future.
As an unprecedented breakthrough in experimental molecular biology domain, DNA microarray enables simultaneously monitoring of the expression level of thousands of genes over many experimental conditions. Studies have shown that analyzing microarray data is essential for finding gene co-expression network, designing new types of drugs, preventing disease, and so on. To analyze gene expression datasets, the researchers design many clustering methods, which can only find fewer of useful knowledge. Due to a subset of genes co-regulate and co-express only under a subset of experimental conditions, and also not co-express at the same level, they can belong to several genetic pathways that are not apparent. In this situation, the biclustering method is proposed. At the same time, the direction of gene expression analysis changes from the whole pattern mining to the local pattern discovery, and then it changes the situation of clustering data only based on all the objects or attributes of the data. The paper introduces the state-of-the-art progress, which includes the definition of local pattern, the types and criteria of local pattern, mining and query methods of local pattern. Then it concludes the mining criteria based on quantity and quality, and related software. Further, it gives the problems in the existing algorithms and tools. Finally, we discuss the research direction in the future.
2018, 55(11): 2361-2371.
DOI: 10.7544/issn1000-1239.2018.20170514
Abstract:
The problem of binary classification with imbalanced data appears in many fields and is still not completely solved. In addition to predicting the classification label directly, many applications also care about the probability that data belongs to a certain class. However, much of the existing research is mainly focused on the classification performance but neglects the probability estimation. The aim of this paper is to improve the performance of class probability estimation (CPE) and ensure the classification performance. A new approach of regression is proposed by adopting the generalized linear model as the basic framework and using the calibration loss function as the objective optimization function. Considering the asymmetry and the flexibility of the generalized extreme value (GEV) distribution, we use it to formulate the link function, which contributes to binary classification with imbalanced data. As to the model estimation, because of the significant influence of the shape parameter on modeling precision, two methods to estimate the shape parameter in GEV distribution are proposed. Experiments on synthetic datasets prove the accuracy of the shape parameter estimation. Besides, experimental results on real data suggest that our proposed approach, compared with other three commonly used regression algorithms, performs well on the classification performance as well as CPE. In addition, the proposed algorithm also outperforms other optimization algorithms in terms of the computational efficiency.
The problem of binary classification with imbalanced data appears in many fields and is still not completely solved. In addition to predicting the classification label directly, many applications also care about the probability that data belongs to a certain class. However, much of the existing research is mainly focused on the classification performance but neglects the probability estimation. The aim of this paper is to improve the performance of class probability estimation (CPE) and ensure the classification performance. A new approach of regression is proposed by adopting the generalized linear model as the basic framework and using the calibration loss function as the objective optimization function. Considering the asymmetry and the flexibility of the generalized extreme value (GEV) distribution, we use it to formulate the link function, which contributes to binary classification with imbalanced data. As to the model estimation, because of the significant influence of the shape parameter on modeling precision, two methods to estimate the shape parameter in GEV distribution are proposed. Experiments on synthetic datasets prove the accuracy of the shape parameter estimation. Besides, experimental results on real data suggest that our proposed approach, compared with other three commonly used regression algorithms, performs well on the classification performance as well as CPE. In addition, the proposed algorithm also outperforms other optimization algorithms in terms of the computational efficiency.
2018, 55(11): 2372-2385.
DOI: 10.7544/issn1000-1239.2018.20180009
Abstract:
Based on colored traveling salesman problem (CTSP), this paper gives a more widely applicable combination optimization problem (COP) model named colored bottleneck traveling salesman problem (CBTSP), which can be used to model the planning problems with the partially overlapped workspace such as the route planning of persons and vehicles with cooperative and independent tasks. The objective function of these problems is different from the one of traveling salesman problems (TSPs), therefore it can’t be modeled by CTSP. Since CBTSP is NP-hard problem, for this kind of large scale problem, nature-inspired algorithms are good choice for solving it. Based on these, the paper proposes a nature-inspired algorithm to solve CBTSP, and the new algorithm named PSGA is a hybrid algorithm of particle swarm optimization (PSO), simulated annealing (SA) and genetic algorithm (GA) based on IT process. PSGA firstly uses dual-chromosome coding to generate solution of CBTSP, and then updates the solution by using the crossover operator of GA. During this process, the length of crossover is controlled by the activity intensity, which is affected by the particle radius and environment temperature. In order to test the effectiveness of PSGA algorithm, the paper makes experiments by using small scale to large scale CBTSP data, and the extensive experiments show that PSGA can demonstrate better solution quality than the compared algorithms.
Based on colored traveling salesman problem (CTSP), this paper gives a more widely applicable combination optimization problem (COP) model named colored bottleneck traveling salesman problem (CBTSP), which can be used to model the planning problems with the partially overlapped workspace such as the route planning of persons and vehicles with cooperative and independent tasks. The objective function of these problems is different from the one of traveling salesman problems (TSPs), therefore it can’t be modeled by CTSP. Since CBTSP is NP-hard problem, for this kind of large scale problem, nature-inspired algorithms are good choice for solving it. Based on these, the paper proposes a nature-inspired algorithm to solve CBTSP, and the new algorithm named PSGA is a hybrid algorithm of particle swarm optimization (PSO), simulated annealing (SA) and genetic algorithm (GA) based on IT process. PSGA firstly uses dual-chromosome coding to generate solution of CBTSP, and then updates the solution by using the crossover operator of GA. During this process, the length of crossover is controlled by the activity intensity, which is affected by the particle radius and environment temperature. In order to test the effectiveness of PSGA algorithm, the paper makes experiments by using small scale to large scale CBTSP data, and the extensive experiments show that PSGA can demonstrate better solution quality than the compared algorithms.
2018, 55(11): 2386-2394.
DOI: 10.7544/issn1000-1239.2018.20170381
Abstract:
Model-based diagnosis is an important research in the field of artificial intelligence, and the diagnosis based on minimal conflict is a classical method to solve the problem of diagnosis. Therefore computing minimal conflict is an important step in model-based diagnosis. After studying the characteristics of the circuit model, this paper proposes an algorithm of computing minimal conflict sets based on the structural feature of fault output (MCS-SFFO) based on the CSRDSE algorithm. Firstly, the pruning rule of CSRDSE algorithm is improved, and it avoids the access to child leaf nodes of leaf nodes which are not conflict sets. Secondly, the concepts of component set independent of fault output and related to fault output are presented, and the method of computing component set independent of fault output is provided according to the system description and observation. Finally, a theorem about non-conflict set is proposed, that is, the component set independent of fault output and its subset are not conflict sets. MCS-SFFO algorithm that computes minimal conflict set is presented according to the non-conflict set theorem. Compared with CSRDSE algorithm, MCS-SFFO algorithm further prunes the non-solution space and reduces the number of calls to the SAT solver. Experimental evidence indicates that MCS-SFFO algorithm has better computational efficiency than CSRDSE algorithm.
Model-based diagnosis is an important research in the field of artificial intelligence, and the diagnosis based on minimal conflict is a classical method to solve the problem of diagnosis. Therefore computing minimal conflict is an important step in model-based diagnosis. After studying the characteristics of the circuit model, this paper proposes an algorithm of computing minimal conflict sets based on the structural feature of fault output (MCS-SFFO) based on the CSRDSE algorithm. Firstly, the pruning rule of CSRDSE algorithm is improved, and it avoids the access to child leaf nodes of leaf nodes which are not conflict sets. Secondly, the concepts of component set independent of fault output and related to fault output are presented, and the method of computing component set independent of fault output is provided according to the system description and observation. Finally, a theorem about non-conflict set is proposed, that is, the component set independent of fault output and its subset are not conflict sets. MCS-SFFO algorithm that computes minimal conflict set is presented according to the non-conflict set theorem. Compared with CSRDSE algorithm, MCS-SFFO algorithm further prunes the non-solution space and reduces the number of calls to the SAT solver. Experimental evidence indicates that MCS-SFFO algorithm has better computational efficiency than CSRDSE algorithm.
2018, 55(11): 2395-2405.
DOI: 10.7544/issn1000-1239.2018.20170607
Abstract:
In the neighborhood rough sets, the attribute reduction based on information measures holds fundamental research value and application significance. However, the conditional neighborhood entropy exhibits granulation non-monotonicity, so its attribute reduction has the research difficulty and application limitation. Aiming at this issue, by virtue of the granular computing technology and its relevant three-layer granular structure, a novel conditional neighborhood entropy with granulation monotonicity is constructed, and its relevant attribute reduction is further investigated. At first, the granulation non-monotonicity and its roots of the conditional neighborhood entropy are revealed; then, the three-layer granular structure is adopted to construct a new conditional neighborhood entropy by the bottom-up strategy, and the corresponding granulation monotonicity is gained; furthermore, relevant attribute reduction and its heuristic reduction algorithm are studied, according to this proposed information measure with the granulation monotonicity; finally, data experiments based on the UCI (University of CaliforniaIrvine) machine learning repository are implemented, and thus they verify both the granulation monotonicity of the constructed conditional neighborhood entropy and the calculation effectiveness of the related heuristic reduction algorithm. As shown by the obtained results, the established conditional neighborhood entropy has the granulation monotonicity to improve the conditional neighborhood entropy, and its induced attribute reduction has broad application prospects.
In the neighborhood rough sets, the attribute reduction based on information measures holds fundamental research value and application significance. However, the conditional neighborhood entropy exhibits granulation non-monotonicity, so its attribute reduction has the research difficulty and application limitation. Aiming at this issue, by virtue of the granular computing technology and its relevant three-layer granular structure, a novel conditional neighborhood entropy with granulation monotonicity is constructed, and its relevant attribute reduction is further investigated. At first, the granulation non-monotonicity and its roots of the conditional neighborhood entropy are revealed; then, the three-layer granular structure is adopted to construct a new conditional neighborhood entropy by the bottom-up strategy, and the corresponding granulation monotonicity is gained; furthermore, relevant attribute reduction and its heuristic reduction algorithm are studied, according to this proposed information measure with the granulation monotonicity; finally, data experiments based on the UCI (University of CaliforniaIrvine) machine learning repository are implemented, and thus they verify both the granulation monotonicity of the constructed conditional neighborhood entropy and the calculation effectiveness of the related heuristic reduction algorithm. As shown by the obtained results, the established conditional neighborhood entropy has the granulation monotonicity to improve the conditional neighborhood entropy, and its induced attribute reduction has broad application prospects.
2018, 55(11): 2406-2418.
DOI: 10.7544/issn1000-1239.2018.20170672
Abstract:
Influence maximization is a problem of finding a small set of seed nodes in a social network that maximizes the spread scope of a propagation item. Existing works only take into account the topic distribution on propagation items, but ignore the interest distribution on users. This paper focuses on how to select the most influential seeds when both the topic distribution of propagation items and the interest distribution of users are taken into consideration. A topic-interest independent cascade (TI-IC) propagation model is proposed, and an expectation maximization (EM) algorithm is proposed to learn the parameters of the TI-IC model. Based on the TI-IC model, a topic-interest influence maximization (TIIM) problem is proposed, and a new heuristic algorithm called ACG-TIIM is presented to solve TIIM. ACG-TIIM first takes each user as a root node to construct a reachable path tree, roughly estimate the influence scope of each user, and then sorts all the users according to the estimated influence scope to select a small number of users as candidate seeds, finally uses the greedy algorithm with EFLF optimization to select the most influential seeds from candidate seeds. The experimental results on real datasets show that TI-IC model is superior to classical IC and TIC models in describing propagation law and predicting propagation results. ACG-TIIM can solve the TIIM problem effectively and efficiently.
Influence maximization is a problem of finding a small set of seed nodes in a social network that maximizes the spread scope of a propagation item. Existing works only take into account the topic distribution on propagation items, but ignore the interest distribution on users. This paper focuses on how to select the most influential seeds when both the topic distribution of propagation items and the interest distribution of users are taken into consideration. A topic-interest independent cascade (TI-IC) propagation model is proposed, and an expectation maximization (EM) algorithm is proposed to learn the parameters of the TI-IC model. Based on the TI-IC model, a topic-interest influence maximization (TIIM) problem is proposed, and a new heuristic algorithm called ACG-TIIM is presented to solve TIIM. ACG-TIIM first takes each user as a root node to construct a reachable path tree, roughly estimate the influence scope of each user, and then sorts all the users according to the estimated influence scope to select a small number of users as candidate seeds, finally uses the greedy algorithm with EFLF optimization to select the most influential seeds from candidate seeds. The experimental results on real datasets show that TI-IC model is superior to classical IC and TIC models in describing propagation law and predicting propagation results. ACG-TIIM can solve the TIIM problem effectively and efficiently.
2018, 55(11): 2419-2429.
DOI: 10.7544/issn1000-1239.2018.20170227
Abstract:
Clustering by fast search and find of density peaks (density peaks clustering algorithm, DPC) is a new clustering analysis algorithm proposed in 2014. It draws decision graphs according to the theory that the cluster centers have larger local density and the distance between larger density points and cluster centers is far away. And then it finds the density peaks, as to obtain any shapes of the clusters. However, as the computation of local density and distance relies on the similarity matrix in the process of finding cluster centers, the computational complexity is relatively higher, which limits the application of DPC in large-scale datasets. In this paper, a density peaks clustering algorithm based on grid screening (SDPC) is proposed. SDPC removes the points with sparse local density based on the grid method according to the uneven distribution of datasets firstly. And then the cluster centers are selected by drawing the decision graph based on DPC, which reduces the computational complexity effectively on the basis of ensuring the clustering accuracy. Theoretical analysis and experimental results show that DPC based on grid screening can not only cluster large-scale datasets correctly, but also reduce the time complexity greatly.
Clustering by fast search and find of density peaks (density peaks clustering algorithm, DPC) is a new clustering analysis algorithm proposed in 2014. It draws decision graphs according to the theory that the cluster centers have larger local density and the distance between larger density points and cluster centers is far away. And then it finds the density peaks, as to obtain any shapes of the clusters. However, as the computation of local density and distance relies on the similarity matrix in the process of finding cluster centers, the computational complexity is relatively higher, which limits the application of DPC in large-scale datasets. In this paper, a density peaks clustering algorithm based on grid screening (SDPC) is proposed. SDPC removes the points with sparse local density based on the grid method according to the uneven distribution of datasets firstly. And then the cluster centers are selected by drawing the decision graph based on DPC, which reduces the computational complexity effectively on the basis of ensuring the clustering accuracy. Theoretical analysis and experimental results show that DPC based on grid screening can not only cluster large-scale datasets correctly, but also reduce the time complexity greatly.
2018, 55(11): 2430-2438.
DOI: 10.7544/issn1000-1239.2018.20170580
Abstract:
In this paper, a monaural speech enhancement method combining deep neural network (DNN) with sparse non-negative matrix factorization (SNMF) is proposed. This method takes advantage of the sparse characteristic of speech signal in time-frequency (T-F) domain and the spectral preservation characteristic of DNN presented in speech enhancement, aiming to resolve the distortion problem introduced by low SNR situation and unvoiced components without structure characteristics in conventional non-negative matrix factorization (NMF) method. Firstly, the magnitude spectrogram matrix of noisy speech is decomposed by NMF with sparse constraint to obtain the corresponding coding matrix coefficients of speech and noise dictionary. The speech and noise dictionary are pre-trained independently. Then Wiener filtering method is used to get the separated speech and noise. DNN is employed to model the non-linear function which maps the log magnitude spectrum of the separated speech from Wiener filter to the target clean speech. Evaluations are conducted on the IEEE dataset, both stationary and non-stationary types of noise are selected to demonstrate the effectiveness of the proposed method. The experimental results show that the proposed method could effectively suppress the noise and preserve the speech component from the corrupted speech signal. It has better performance than the baseline methods in terms of perceptual quality and log-spectral distortion.
In this paper, a monaural speech enhancement method combining deep neural network (DNN) with sparse non-negative matrix factorization (SNMF) is proposed. This method takes advantage of the sparse characteristic of speech signal in time-frequency (T-F) domain and the spectral preservation characteristic of DNN presented in speech enhancement, aiming to resolve the distortion problem introduced by low SNR situation and unvoiced components without structure characteristics in conventional non-negative matrix factorization (NMF) method. Firstly, the magnitude spectrogram matrix of noisy speech is decomposed by NMF with sparse constraint to obtain the corresponding coding matrix coefficients of speech and noise dictionary. The speech and noise dictionary are pre-trained independently. Then Wiener filtering method is used to get the separated speech and noise. DNN is employed to model the non-linear function which maps the log magnitude spectrum of the separated speech from Wiener filter to the target clean speech. Evaluations are conducted on the IEEE dataset, both stationary and non-stationary types of noise are selected to demonstrate the effectiveness of the proposed method. The experimental results show that the proposed method could effectively suppress the noise and preserve the speech component from the corrupted speech signal. It has better performance than the baseline methods in terms of perceptual quality and log-spectral distortion.
Abstract:
Plenty and well labeled training samples are significant foundation to make sure the good performance of supervising learning, whereas there is a problem of high labor-cost and time-consuming in the samples. Furthermore, it is not always feasible to get the plenty of well-labeled sample data in every application to support the classification training. Meanwhile, directly employing the trained model from the source domain to the target domain normally causes the problem of accuracy degradation, due to the information distribution discrepancy between the source domain and the target domain. Aiming to solve the above problems, we propose an algorithm named domain alignment based on multi-viewpoint domain-shared feature for cross-domain sentiment classification (DAMF). Firstly, we fuse three sentiment lexicons to eliminate the polarity divergence of domain-shared feature words that are chosen by mutual information value. On this basis, we extract the word pairs that have the same sentiment polarity in the same domain by utilizing four syntax rules and the word pairs that have strong association relation in the same domain by utilizing association rules algorithm. Then, we use the domain-shared words that have no polarity divergence as a bridge to establish an indirect mapping relationship between domain-specific words in different domains. By constructing the unified feature representation space of different domains, the domain alignment is achieved. Meanwhile, the experiments on four public data sets from Amazon product reviews corpora show the effectiveness of our proposed algorithm on cross-domain sentiment classification.
Plenty and well labeled training samples are significant foundation to make sure the good performance of supervising learning, whereas there is a problem of high labor-cost and time-consuming in the samples. Furthermore, it is not always feasible to get the plenty of well-labeled sample data in every application to support the classification training. Meanwhile, directly employing the trained model from the source domain to the target domain normally causes the problem of accuracy degradation, due to the information distribution discrepancy between the source domain and the target domain. Aiming to solve the above problems, we propose an algorithm named domain alignment based on multi-viewpoint domain-shared feature for cross-domain sentiment classification (DAMF). Firstly, we fuse three sentiment lexicons to eliminate the polarity divergence of domain-shared feature words that are chosen by mutual information value. On this basis, we extract the word pairs that have the same sentiment polarity in the same domain by utilizing four syntax rules and the word pairs that have strong association relation in the same domain by utilizing association rules algorithm. Then, we use the domain-shared words that have no polarity divergence as a bridge to establish an indirect mapping relationship between domain-specific words in different domains. By constructing the unified feature representation space of different domains, the domain alignment is achieved. Meanwhile, the experiments on four public data sets from Amazon product reviews corpora show the effectiveness of our proposed algorithm on cross-domain sentiment classification.
2018, 55(11): 2452-2466.
DOI: 10.7544/issn1000-1239.2018.20170658
Abstract:
With the flourishing development of blockchain technology represented by bitcoin, the blockchain technology has moved from the era of programmable currency into the era of smart contract. The smart contract is an event-driven, state-based code contract and algorithm contract, which has been widely concerned and studied with the deep development of blockchain technology. The protocol and user interface are applied to complete all steps of the smart contract process. Smart contract enables users to implement personalized logic on the blockchain. The blockchain-based smart contract technology has the characteristics of de-centralization, autonomy, observability, verifiability and information sharing. It can also be effectively applied to build programmable finance and programmable society, which has been widely used in digital payment, financial asset disposal, multi-signature contract, cloud computing, Internet of things, sharing economy and other fields. The survey describes the basic concepts of smart contract technology, its whole life cycle, basic classification and structure, key technology, the art of the state, as well as its application scenarios and the main technology platforms. Its problems encountered at present are also discussed. Finally, based on the theoretical knowledge of the smart contract, we set up the Ethereum experimental environment and develop a system of crowdsale contract. The survey is aimed at providing helpful guidance and reference for future research of smart contract based on blockchain technology.
With the flourishing development of blockchain technology represented by bitcoin, the blockchain technology has moved from the era of programmable currency into the era of smart contract. The smart contract is an event-driven, state-based code contract and algorithm contract, which has been widely concerned and studied with the deep development of blockchain technology. The protocol and user interface are applied to complete all steps of the smart contract process. Smart contract enables users to implement personalized logic on the blockchain. The blockchain-based smart contract technology has the characteristics of de-centralization, autonomy, observability, verifiability and information sharing. It can also be effectively applied to build programmable finance and programmable society, which has been widely used in digital payment, financial asset disposal, multi-signature contract, cloud computing, Internet of things, sharing economy and other fields. The survey describes the basic concepts of smart contract technology, its whole life cycle, basic classification and structure, key technology, the art of the state, as well as its application scenarios and the main technology platforms. Its problems encountered at present are also discussed. Finally, based on the theoretical knowledge of the smart contract, we set up the Ethereum experimental environment and develop a system of crowdsale contract. The survey is aimed at providing helpful guidance and reference for future research of smart contract based on blockchain technology.
Abstract:
With the adoption of the general standard of communication protocol, wireless service system (WSS) in IoT is proposed to achieve the real-time connection between person and things or things and things. According to the characteristics of IEEE 802.15.4, wireless sensor nodes are embedded in the devices for data collection and command broadcasting. However, the isomorphism of the sensor nodes makes the worm propagation an increasingly serious problem. Firstly, based on the classification of epidemiological models related to worm propagation and the analysis of the characteristics of various models, an epidemiological model is constructed, which specially introduces sleep state and quarantine state into state transition. The transition relationship of nodes is defined simultaneously. Secondly, according to the radio frequency, the number and range of infected nodes with actual transmission ability are determined. Thirdly, we introduce the target cost function between worm and wireless service system, and put forward a dynamic differential game with complete information based on the overall damage. Then, the existence of saddle-point solution is proved, which is solved by combining state parameters, cooperative state variables and Hamiltonian functions. The optimal defense algorithm is proposed to minimize target cost function. Finally, different algorithms are implemented and the performance evaluation is carried out by comparing the characteristics of nodes in each state and the corresponding overall damage. The experimental results show that the optimal defense algorithm based on improved epidemic model can suppress worm propagation in wireless service system effectively and efficiently.
With the adoption of the general standard of communication protocol, wireless service system (WSS) in IoT is proposed to achieve the real-time connection between person and things or things and things. According to the characteristics of IEEE 802.15.4, wireless sensor nodes are embedded in the devices for data collection and command broadcasting. However, the isomorphism of the sensor nodes makes the worm propagation an increasingly serious problem. Firstly, based on the classification of epidemiological models related to worm propagation and the analysis of the characteristics of various models, an epidemiological model is constructed, which specially introduces sleep state and quarantine state into state transition. The transition relationship of nodes is defined simultaneously. Secondly, according to the radio frequency, the number and range of infected nodes with actual transmission ability are determined. Thirdly, we introduce the target cost function between worm and wireless service system, and put forward a dynamic differential game with complete information based on the overall damage. Then, the existence of saddle-point solution is proved, which is solved by combining state parameters, cooperative state variables and Hamiltonian functions. The optimal defense algorithm is proposed to minimize target cost function. Finally, different algorithms are implemented and the performance evaluation is carried out by comparing the characteristics of nodes in each state and the corresponding overall damage. The experimental results show that the optimal defense algorithm based on improved epidemic model can suppress worm propagation in wireless service system effectively and efficiently.
2018, 55(11): 2482-2489.
DOI: 10.7544/issn1000-1239.2018.20170420
Abstract:
With the rapid development of cloud computing and the arrival of large data age, users are confronted with huge data and information to be processed which means massive amounts of difficult tasks. Consequently, how to securely outsource some time-consuming computing tasks to an untrusted public cloud server has aroused widespread concern. To realize the data privacy protection and the verifiability of calculation results in outsourcing computing, based on the single server model, this paper proposes a new privacy-preserving protocol for outsourcing power exponent on a group field, called GEXP(outsourcing power exponent on a group field). The scheme can prevent adversaries from getting any input/output data. Moreover, it effectively avoids the collusion attack in the dual server model. Compared with the existing schemes, GEXP can detect the wrong result returned by the cloud server with 100% probability, which ensures that the user can fully verify the result of outsourcing calculation. The formal security analysis and experiments indicate that our scheme is to protect privacy and highly efficient. In experiments, we compare our scheme with other state-of-the-art schemes to further demonstrate the superiorities in security and efficiency. In addition, in order to prove the practicality of our scheme, this paper gives the specific application of GEXP in cloud storage data integrity verification.
With the rapid development of cloud computing and the arrival of large data age, users are confronted with huge data and information to be processed which means massive amounts of difficult tasks. Consequently, how to securely outsource some time-consuming computing tasks to an untrusted public cloud server has aroused widespread concern. To realize the data privacy protection and the verifiability of calculation results in outsourcing computing, based on the single server model, this paper proposes a new privacy-preserving protocol for outsourcing power exponent on a group field, called GEXP(outsourcing power exponent on a group field). The scheme can prevent adversaries from getting any input/output data. Moreover, it effectively avoids the collusion attack in the dual server model. Compared with the existing schemes, GEXP can detect the wrong result returned by the cloud server with 100% probability, which ensures that the user can fully verify the result of outsourcing calculation. The formal security analysis and experiments indicate that our scheme is to protect privacy and highly efficient. In experiments, we compare our scheme with other state-of-the-art schemes to further demonstrate the superiorities in security and efficiency. In addition, in order to prove the practicality of our scheme, this paper gives the specific application of GEXP in cloud storage data integrity verification.
2018, 55(11): 2490-2500.
DOI: 10.7544/issn1000-1239.2018.20170666
Abstract:
Because of the advantages of strong robustness and good scalability, TWSNs (two-tiered wireless sensor networks), which are known as parts of the IoT (Internet of things) observation systems, attract more and more attention. However, many security problems have not yet been well solved in TWSNs. In hostile environments, the adversaries are prone to illegally obtain the information stored on the master nodes, which are known as the key nodes of TWSNs, and even destroy the integrity of the query results returned to Sink node by capturing the master nodes and making them malicious. In this paper, we focus on the problem of privacy-and-integrity preservation for Top-k queries in TWSNs and propose a secure query-processing protocol named VPP (verifiable privacy-and-integrity preservation). Based on the OPES (order preserving encryption scheme), the SC (symmetric ciphering) and the weight binding techniques, VPP achieves privacy-and-integrity preservation for Top-k queries by specifying the data preprocessing mechanism at the sensor nodes, the Top-k query-processing mechanism at the storage nodes, and the integrity-validating method at Sink node. Both theoretic analysis and simulation results show that VPP outperforms the state-of-the-art scheme on not only the security but also the energy efficiency of Top-k query processing in TWSNs with reasonable computation complexity.
Because of the advantages of strong robustness and good scalability, TWSNs (two-tiered wireless sensor networks), which are known as parts of the IoT (Internet of things) observation systems, attract more and more attention. However, many security problems have not yet been well solved in TWSNs. In hostile environments, the adversaries are prone to illegally obtain the information stored on the master nodes, which are known as the key nodes of TWSNs, and even destroy the integrity of the query results returned to Sink node by capturing the master nodes and making them malicious. In this paper, we focus on the problem of privacy-and-integrity preservation for Top-k queries in TWSNs and propose a secure query-processing protocol named VPP (verifiable privacy-and-integrity preservation). Based on the OPES (order preserving encryption scheme), the SC (symmetric ciphering) and the weight binding techniques, VPP achieves privacy-and-integrity preservation for Top-k queries by specifying the data preprocessing mechanism at the sensor nodes, the Top-k query-processing mechanism at the storage nodes, and the integrity-validating method at Sink node. Both theoretic analysis and simulation results show that VPP outperforms the state-of-the-art scheme on not only the security but also the energy efficiency of Top-k query processing in TWSNs with reasonable computation complexity.
2018, 55(11): 2501-2510.
DOI: 10.7544/issn1000-1239.2018.20170661
Abstract:
Location is one of the basic properties for an object. With the development of wireless sensor network (WSN), the requirements for sensor nodes’ location have become more and more important in practical applications. Ultra-wideband (UWB) and inertial measurement units (IMU) have been widely used in WSN due to their high positioning accuracy. UWB is of high accuracy but it is susceptible to multipath effects and relative geometric position relationships between nodes. IMU can provide continuous inertial information, but cannot solve the cumulative error problem. Thus, the fusion positioning method based on UWB and IMU can compensate the UWB multipath effect and IMU error accumulation problem, and finally improve the positioning accuracy. In this paper, a new fusion positioning method based on UWB and IMU is proposed to realize the high-precision position tracking of the target nodes in sensor network. The CRLB (Cramer-Rao lower bound) is calculated to characterize the spatial location performance of the fusion positioning method, and the PCRLB (posterior Cramer-Rao lower bound) is calculated to characterize the temporal positioning performance of the fusion localization method. Both CRLB and PCRLB are used to provide theoretical support for the design and simulation of the fusion positioning algorithms. Experimental results show that the proposed fusion method has better positioning performance in both temporal and spatial aspects, which is closer to the theoretical lower bound of practical application.
Location is one of the basic properties for an object. With the development of wireless sensor network (WSN), the requirements for sensor nodes’ location have become more and more important in practical applications. Ultra-wideband (UWB) and inertial measurement units (IMU) have been widely used in WSN due to their high positioning accuracy. UWB is of high accuracy but it is susceptible to multipath effects and relative geometric position relationships between nodes. IMU can provide continuous inertial information, but cannot solve the cumulative error problem. Thus, the fusion positioning method based on UWB and IMU can compensate the UWB multipath effect and IMU error accumulation problem, and finally improve the positioning accuracy. In this paper, a new fusion positioning method based on UWB and IMU is proposed to realize the high-precision position tracking of the target nodes in sensor network. The CRLB (Cramer-Rao lower bound) is calculated to characterize the spatial location performance of the fusion positioning method, and the PCRLB (posterior Cramer-Rao lower bound) is calculated to characterize the temporal positioning performance of the fusion localization method. Both CRLB and PCRLB are used to provide theoretical support for the design and simulation of the fusion positioning algorithms. Experimental results show that the proposed fusion method has better positioning performance in both temporal and spatial aspects, which is closer to the theoretical lower bound of practical application.
2018, 55(11): 2511-2521.
DOI: 10.7544/issn1000-1239.2018.20170710
Abstract:
By taking the energy efficiency as the objective function, a nonlinear programming problem with nonlinear constraints is studied under the constraints of time delay and transmission power. That is to say, a kind of new power-resource allocation algorithm (PAA) with interference restraining based on FBMC-OQAM (filter bank multicarrier-offset quadrature amplitude modulation) has been presented in this paper, which can improve the energy efficiency of entire network resource and protect small-cell user (SU) in the network from too much interference while virtual queue is used to transform the extra packet delay caused by the contention for channel of multi-user into the queuing delay in the virtual queue. An iterative algorithm for PAA to solve the problem is used. The fractional objective function is transformed into polynomial form, and the global optimal solution is obtained by iteration after reducing the computational complexity. At the same time, a sub-optimal method is developed to reduce computational complexity and some performance. The simulation results show that the optimal algorithm has higher performance and the sub-optimal method has lower computational complexity. The designed algorithm has important value for the practical applications, such as the Internet of things, Internet of vehicles, signal processing, artificial intelligence, and so on. Now, it has been used in our project on cognitive radio network (CRN) to solve the problem of power resource allocation.
By taking the energy efficiency as the objective function, a nonlinear programming problem with nonlinear constraints is studied under the constraints of time delay and transmission power. That is to say, a kind of new power-resource allocation algorithm (PAA) with interference restraining based on FBMC-OQAM (filter bank multicarrier-offset quadrature amplitude modulation) has been presented in this paper, which can improve the energy efficiency of entire network resource and protect small-cell user (SU) in the network from too much interference while virtual queue is used to transform the extra packet delay caused by the contention for channel of multi-user into the queuing delay in the virtual queue. An iterative algorithm for PAA to solve the problem is used. The fractional objective function is transformed into polynomial form, and the global optimal solution is obtained by iteration after reducing the computational complexity. At the same time, a sub-optimal method is developed to reduce computational complexity and some performance. The simulation results show that the optimal algorithm has higher performance and the sub-optimal method has lower computational complexity. The designed algorithm has important value for the practical applications, such as the Internet of things, Internet of vehicles, signal processing, artificial intelligence, and so on. Now, it has been used in our project on cognitive radio network (CRN) to solve the problem of power resource allocation.
2018, 55(11): 2522-2531.
DOI: 10.7544/issn1000-1239.2018.20170664
Abstract:
With the rapid development of IoT technology, everything in the world is going to interconnect via IoT, which has become a popular technology that permits users to connect anywhere, anytime, anyplace, anyone and any device, involving many domains such as smart home, intelligent transportation and so on. However, most of IoT heterogeneous data are isolated, which holds back the progress of IoT interconnection. Schema matching techniques are widely used in the scenario of data interconnection to solve the problem above. Because of the characteristics of IoT heterogeneous data such as heterogeneity and increasing growth, the existing schema matching approaches can’t solve the problems caused by schema auto-matching under new IoT environment. In this paper, we attempt to solve this problem by introducing a new algorithm based on hierarchical method that could fulfill automatic schema matching for IoT heterogeneous data. Our algorithm has three parts: classifi-cation matching, clustering matching and mixed element matching. By each step, we keep narrowing down matching space and improving matching quality. We demonstrate the utility and efficiency of our algorithm with a set of comprehensive experiments on real datasets from the scenario of IoT industrial household appliances testing. The result shows that our algorithm has good performance.
With the rapid development of IoT technology, everything in the world is going to interconnect via IoT, which has become a popular technology that permits users to connect anywhere, anytime, anyplace, anyone and any device, involving many domains such as smart home, intelligent transportation and so on. However, most of IoT heterogeneous data are isolated, which holds back the progress of IoT interconnection. Schema matching techniques are widely used in the scenario of data interconnection to solve the problem above. Because of the characteristics of IoT heterogeneous data such as heterogeneity and increasing growth, the existing schema matching approaches can’t solve the problems caused by schema auto-matching under new IoT environment. In this paper, we attempt to solve this problem by introducing a new algorithm based on hierarchical method that could fulfill automatic schema matching for IoT heterogeneous data. Our algorithm has three parts: classifi-cation matching, clustering matching and mixed element matching. By each step, we keep narrowing down matching space and improving matching quality. We demonstrate the utility and efficiency of our algorithm with a set of comprehensive experiments on real datasets from the scenario of IoT industrial household appliances testing. The result shows that our algorithm has good performance.
2018, 55(11): 2532-2542.
DOI: 10.7544/issn1000-1239.2018.20170671
Abstract:
Industrial control system (ICS) has highly correlation with physical environment. As a unique type of ICS attack, sequence attack injects the normal operations into the wrong sequence positions, which disturbs the process or even destroys the equipment. At present, most anomaly detection methods for sequence attack just detect the operation sequence acquiring from information flow. However, ICS is weak in protecting itself from cyber-attacks, which means that the data of information flow can be faked by attackers. The fake data is one of the main issues that can severely affect the detection accuracy. To remedy this problem, a fusion ICS anomaly detection algorithm is proposed in this paper. This algorithm utilizes the state information of equipment to establish the state flow. Via fusing state flow with information flow, the anomaly of operation sequence can be detected from the aspects of time and order. Meanwhile, to extend the detection range and reduce the detection latency, we use the data of state flow to recognize the anomaly state of equipment between two operations, which is caused by the sequence attack or other attacks. The experimental results in an ICS testbed demonstrate that our detection algorithm can detect sequence attack efficiently and recognize part of anomaly state of ICS equipment.
Industrial control system (ICS) has highly correlation with physical environment. As a unique type of ICS attack, sequence attack injects the normal operations into the wrong sequence positions, which disturbs the process or even destroys the equipment. At present, most anomaly detection methods for sequence attack just detect the operation sequence acquiring from information flow. However, ICS is weak in protecting itself from cyber-attacks, which means that the data of information flow can be faked by attackers. The fake data is one of the main issues that can severely affect the detection accuracy. To remedy this problem, a fusion ICS anomaly detection algorithm is proposed in this paper. This algorithm utilizes the state information of equipment to establish the state flow. Via fusing state flow with information flow, the anomaly of operation sequence can be detected from the aspects of time and order. Meanwhile, to extend the detection range and reduce the detection latency, we use the data of state flow to recognize the anomaly state of equipment between two operations, which is caused by the sequence attack or other attacks. The experimental results in an ICS testbed demonstrate that our detection algorithm can detect sequence attack efficiently and recognize part of anomaly state of ICS equipment.
2018, 55(11): 2543-2556.
DOI: 10.7544/issn1000-1239.2018.20170390
Abstract:
Hair is a vital component of a person’s identity, and it can provide strong cues about age, background and even personality. More and more researchers focus on hair modeling in the field of computer graphics and virtual reality. Traditional hair modeling method is physically-based simulation by setting different parameters. The computation is expensive, and the construction process is not intuitive and difficult to control. The image-based hair modeling method has the advantages of fast modeling and high fidelity. In recent years, researchers have attached great importance to this method. This paper reviews the development of hair modeling based on single image, static hair modeling based on multiple images, dynamic hair modeling based on captured videos, and the editing and reusing of hair modeling results. It also analyzes and summarizes the application and inadequate of each method. In the first section, it summarizes the single-image based hair modeling method which can be divided into two types of orientation-field-based and data-driven-based hair modeling method. The static hair modeling and dynamic hair capture methods are reviewed in the next two parts. The static hair modeling based on multiple images also can be divided into two types of orientation-field-based and data-driven-based method. In the last section, the editing and reusing of hair modeling results are reviewed. The future development trends and challenges of image-based hair modeling are proposed in the end.
Hair is a vital component of a person’s identity, and it can provide strong cues about age, background and even personality. More and more researchers focus on hair modeling in the field of computer graphics and virtual reality. Traditional hair modeling method is physically-based simulation by setting different parameters. The computation is expensive, and the construction process is not intuitive and difficult to control. The image-based hair modeling method has the advantages of fast modeling and high fidelity. In recent years, researchers have attached great importance to this method. This paper reviews the development of hair modeling based on single image, static hair modeling based on multiple images, dynamic hair modeling based on captured videos, and the editing and reusing of hair modeling results. It also analyzes and summarizes the application and inadequate of each method. In the first section, it summarizes the single-image based hair modeling method which can be divided into two types of orientation-field-based and data-driven-based hair modeling method. The static hair modeling and dynamic hair capture methods are reviewed in the next two parts. The static hair modeling based on multiple images also can be divided into two types of orientation-field-based and data-driven-based method. In the last section, the editing and reusing of hair modeling results are reviewed. The future development trends and challenges of image-based hair modeling are proposed in the end.
2018, 55(11): 2557-2568.
DOI: 10.7544/issn1000-1239.2018.20170247
Abstract:
Using a 4-neighbor template to perform the nonlocal prediction on the index map is one of typical palette coding techniques. By analyzing the experimental results, it is found that one 4-neighbor template usually has a large number of interference templates and cannot effectively capture the color transition features in the edges’ anti-aliasing area. Therefore, a 2-neighbor template is proposed which includes four subtemplates to represent particular color transition modes of the foreground objects and the text edges at their upper left corners, lower left corners, upper right corners, as well as lower right corners. Meanwhile, the template prediction is further modeled into a transition probability that can be implemented by table lookup operations. An index map prediction method is further addressed using the 2-neighbor joint transition probability. Experimental results show that the prediction accuracy of the proposed method is 97.70%, which is separately 4.50% and 2.27% higher than that of the multi-stage prediction (MSP) method and that of the local directional prediction (LDP) method. It is especially suitable for the complex screen content coding with a large number of characters and computer-generated geometrical primitives. Moreover, the computational complexity of the proposed method is equivalent to that of the LDP method, and obviously lower than that of the MSP. The proposed method can be applied into the palette-index map based screen content predictive coding with high real-time demand.
Using a 4-neighbor template to perform the nonlocal prediction on the index map is one of typical palette coding techniques. By analyzing the experimental results, it is found that one 4-neighbor template usually has a large number of interference templates and cannot effectively capture the color transition features in the edges’ anti-aliasing area. Therefore, a 2-neighbor template is proposed which includes four subtemplates to represent particular color transition modes of the foreground objects and the text edges at their upper left corners, lower left corners, upper right corners, as well as lower right corners. Meanwhile, the template prediction is further modeled into a transition probability that can be implemented by table lookup operations. An index map prediction method is further addressed using the 2-neighbor joint transition probability. Experimental results show that the prediction accuracy of the proposed method is 97.70%, which is separately 4.50% and 2.27% higher than that of the multi-stage prediction (MSP) method and that of the local directional prediction (LDP) method. It is especially suitable for the complex screen content coding with a large number of characters and computer-generated geometrical primitives. Moreover, the computational complexity of the proposed method is equivalent to that of the LDP method, and obviously lower than that of the MSP. The proposed method can be applied into the palette-index map based screen content predictive coding with high real-time demand.
2018, 55(11): 2569-2583.
DOI: 10.7544/issn1000-1239.2018.20170893
Abstract:
Reliability is an important figure of merit for the system and it must be satisfied in safety-critical applications. The systems’ reliability can be improved by resource redundancy, however, it must consume more system resources. The problem of minimizing resource consumption to satisfy reliability goal for parallel applications on heterogeneous systems is investigated. First, the reliability goal of the system is transformed to that of each single task, in which the average worst-case execution time (WCET) of a task on each processor is used as a reference for calculating the reliability goal. Two methods for calculating the reliability goal of each task are proposed for task replication and non-replication. Then, an algorithm of task non-replication for minimizing resource consumption with reliability goal (MRC) is designed. When the system reliability goal requirement isn’t higher than the reachable maximum reliability, the tasks can always be assigned to the appropriate processor so that the reliability goal of the system can be satisfied. Considering that the high-reliability requirements of the system cannot be satisfied by using MRC. Finally, two algorithms for task replication are designed to satisfy system reliability goal. The proposed algorithms are compared with MaxRe, RR, and MRCRG by using real parallel applications and randomly generated parallel applications. Experimental results demonstrate that the proposed algorithms consume fewer resources while satisfying the system reliability goal.
Reliability is an important figure of merit for the system and it must be satisfied in safety-critical applications. The systems’ reliability can be improved by resource redundancy, however, it must consume more system resources. The problem of minimizing resource consumption to satisfy reliability goal for parallel applications on heterogeneous systems is investigated. First, the reliability goal of the system is transformed to that of each single task, in which the average worst-case execution time (WCET) of a task on each processor is used as a reference for calculating the reliability goal. Two methods for calculating the reliability goal of each task are proposed for task replication and non-replication. Then, an algorithm of task non-replication for minimizing resource consumption with reliability goal (MRC) is designed. When the system reliability goal requirement isn’t higher than the reachable maximum reliability, the tasks can always be assigned to the appropriate processor so that the reliability goal of the system can be satisfied. Considering that the high-reliability requirements of the system cannot be satisfied by using MRC. Finally, two algorithms for task replication are designed to satisfy system reliability goal. The proposed algorithms are compared with MaxRe, RR, and MRCRG by using real parallel applications and randomly generated parallel applications. Experimental results demonstrate that the proposed algorithms consume fewer resources while satisfying the system reliability goal.