ISSN 1000-1239 CN 11-1777/TP

Table of Content

01 March 2016, Volume 53 Issue 3
A Hybrid Approach for Ripple Effect Analysis of Software Evolution Activities
2016, 53(3):  503-516.  doi:10.7544/issn1000-1239.2016.20140727
Asbtract ( 651 )   HTML ( 1)   PDF (4136KB) ( 631 )  
Related Articles | Metrics
Ripple effect is defined as the process of determining potential effects to a subject system resulting from a proposed software evolving activity. Since software engineers perform different evolving activities to respond to different kinds of requirements, ripple effect analysis has been globally recognized as a key factor of affecting the success of software evolution projects. The precision of most existing ripple effect analysis methods is not as good as expectation and lots of methods have their inherent limitations. This paper proposes a hybrid analysis method which combines the dynamic and information retrieval based techniques to support ripple effect analysis in source code. This combination is able to keep the feature of high recall rate of dynamic method and reduce the adverse effects of noise and analysis scope by the domain knowledge which is derived from source code by information retrieval technique. In order to verify the effectiveness of the proposed approach, we have performed the ripple effect analysis and compared the analysis results produced by dynamic, static, text, historical evolving knowledge based methods with the proposed method on one open source software named jEdit. The results show that the hybrid ripple effect analysis method, across several cut points, provides statistically significant improvements in both precision and recall rate over these techniques used independently.
A Class Integration Test Order Method Based on the Node Importance of Software
2016, 53(3):  517-530.  doi:10.7544/issn1000-1239.2016.20148318
Asbtract ( 874 )   HTML ( 1)   PDF (3571KB) ( 705 )  
Related Articles | Metrics
Class integration test order is one of the most important issues in the field of software integration testing. The different integration test orders have great influence on the testing costs and testing efficiency. In order to reduce the test costs, the traditional studies of generating class-level integration test order focused on reducing the total test stub complexities and the number of test stubs. We apply the thought of “bugs should be found as early as possible” to generate the integration test order, and propose an evaluation scheme “Class-HITS” to measure the importance of software nodes. In the proposed scheme, we firstly consider the software as a network based on the complex network theory, and with the help of proposed scheme, the circles of the network are eliminated. Finally, the integration test order is obtained by reverse sorting the nodes in acyclic links. The simulations have been carried out on open-source software, which prove the efficiency of the class integration test order obtained by the proposed scheme. The new algorithm ensures the important nodes with priority to be tested, the smaller total complexity of the test stubs.
A Graph Database Based Method for Parsing and Searching Code Structure
2016, 53(3):  531-540.  doi:10.7544/issn1000-1239.2016.20148325
Asbtract ( 761 )   HTML ( 5)   PDF (2121KB) ( 679 )  
Related Articles | Metrics
Software reuse is a solution to reduce duplication of effort in software development. When reusing an existing software project, software developers usually need to understand how code elements in it are worked and their correlation, which is called code structure. Software developers usually navigate among source code files to understand code structure. This task could be time-consuming and difficult, since source code of a software project is usually large and complex. Therefore, it is essential to demonstrate code structure in an automatic way that software developers can understand it clearly. For this purpose, this paper introduces a graph database based method for parsing and searching code structure. Code structure is extracted from source code files, and well-organized as a labeled and directed graph in graph database. Software developers input natural language queries. A search mechanism analyzes each of these queries, searches the whole code structure and determines which part of the code structure should be demonstrated. This method is of high extensibility: code elements at different granularity and various relationship types among them can be easily stored into the graph database, and analyzing algorithms for different search purposes can be easily integrated into the search mechanism. A tool is implemented based on this method. Experiment shows that with the help of this tool, the time software developers spending on understanding code structure reduces by 17%, which validates that our method does help improving the efficiency of software reuse. An industrial case study has been showed on how software developers get help from this method.
Output Domain Based Automatic Test Case Generation
You Feng, Zhao Ruilian, Lü Shanshan
2016, 53(3):  541-549.  doi:10.7544/issn1000-1239.2016.20148045
Asbtract ( 709 )   HTML ( 2)   PDF (2053KB) ( 477 )  
Related Articles | Metrics
For most software systems it is very hard to obtain expected output automatically on the basis of specifications. However, there exist many notable detection points in output domain of some software, so it may be more suitable to develop test cases from output domain than from input. In addition, even if an output is given, it is also difficult to find its input automatically. Therefore in this paper, we present an output domain based automatic test case generation method. At first, a back propagation neural network is used to create a model that can be taken as a function substitute for the software under test, and then according to the created function model, genetic algorithms are employed to search the corresponding inputs for given outputs. In order to improve the effectiveness of test case generation, a new crossover operation and a mutation operation are introduced in our genetic algorithm. Moreover, a number of experiments have been conducted on test generation based on the created function models over the fault tolerant software RSDIMU and three common used software. The experimental results show that the approach is promising and effective, and our genetic algorithm can distinctly enhance the efficiency and successful ratio to test case generation from output domains.
Regression Test Case Prioritization Based on Bug Propagation Network
2016, 53(3):  550-558.  doi:10.7544/issn1000-1239.2016.20148329
Asbtract ( 858 )   HTML ( 2)   PDF (1815KB) ( 678 )  
Related Articles | Metrics
Test case prioritization (TCP), as one of the regression testing techniques, can greatly improve the efficiency of regression testing. Considering that most of the existing TCP techniques neglect to use software structure information, this paper proposes a new regression test case prioritization technique based on bug propagation network. It uses weighted class dependency network (WCDN) to represent the topological structure of a piece of software at the class level of granularity, and then analyzes the propagation process of bugs on WCDN to construct the BPN. When performing regression test case prioritization, it first locates the modified classes and the potential impacted classes. Then it calculates the test influence for each class. Finally, the test importance for each test case, the sum of the test importance of all classes it covered, is calculated as the criteria to prioritize regression test cases. Case studies on several real world open-source software systems show that, compared with other test case prioritization techniques, the proposed technique has better effectiveness and comparable stability.
Efficient Duplicate Detection Approach for High Dimensional Big Data
2016, 53(3):  559-570.  doi:10.7544/issn1000-1239.2016.20148218
Asbtract ( 941 )   HTML ( 3)   PDF (3819KB) ( 841 )  
Related Articles | Metrics
The big data era has huge quantity of heterogeneous data from multiple sources be widely used in various domains. Data from multiple sources and of various structures make data duplication inevitable. In addition, such a large amount of data generates an increasing demand for efficient duplicate detection algorithms. Traditional approaches have difficulties in dealing with high dimensional data in big data scenarios. This paper analyses the deficiency of traditional SNM(sorted neighbour method) methods and proposes a novel approach based on clustering. An efficient indexing mechanism is first created with the help of R-tree, which is a variant of B-tree for multi-dimensional space. The proposed algorithm reduces the comparisons needed by taking advantage of the characteristics of clusters and outperforms existing duplicate detection approaches such as SNM, DCS, and DCS++. Furthermore, based on the apriori property of duplicate detection, we develop a new algorithm which can generate the duplicate candidates in parallel manner of the projection of original dataset and then use them to reduce search space of high-dimensional data. Experimental results show that this parallel approach works efficiently when high-dimensional data is encountered. This significant performance improvement suggests that it is ideal for duplicate detection for high dimensional big data.
A Compression-Based Approximate Query Method for Massive Incomplete Data
2016, 53(3):  571-581.  doi:10.7544/issn1000-1239.2016.20150620
Asbtract ( 671 )   HTML ( 2)   PDF (2253KB) ( 421 )  
Related Articles | Metrics
With the explosive increase of data, incomplete data are widespread. Traditional methods of data repair will cause high processing cost for mass data, and cannot be fully restored. Thus the approximate querying on these huge amounts of incomplete data for meeting the given requirements attracted greater attention from academics. Therefore, this paper proposes an approximate query method for massive incomplete data based on compression. Tagging the missing attribute value field and finding out the frequent query conditions, this method compresses these data based on the statistical frequent query conditions, and establishes the corresponding indexes. According to the attribute partition rules, index files are compressed again in order to further save storage space. In the stage of query, this method uses encoding dictionary to make selection and projection operations on the index compression files for getting approximate query results of incomplete data in the end. Experimental results show that this method can quickly locate the position of incomplete data compression, improve the query efficiency, save the storage space, and ensure the integrity of the query results.
Knowledge Graph Construction Techniques
2016, 53(3):  582-600.  doi:10.7544/issn1000-1239.2016.20148228
Asbtract ( 11052 )   HTML ( 353)   PDF (2414KB) ( 18306 )  
Related Articles | Metrics
Google’s knowledge graph technology has drawn a lot of research attentions in recent years. However, due to the limited public disclosure of technical details, people find it difficult to understand the connotation and value of this technology. In this paper, we introduce the key techniques involved in the construction of knowledge graph in a bottom-up way, starting from a clearly defined concept and a technical architecture of the knowledge graph. Firstly, we describe in detail the definition and connotation of the knowledge graph, and then we propose the technical framework for knowledge graph construction, in which the construction process is divided into three levels according to the abstract level of the input knowledge materials, including the information extraction layer, the knowledge integration layer, and the knowledge processing layer, respectively. Secondly, the research status of the key technologies for each level are surveyed comprehensively and also investigated critically for the purposes of gradually revealing the mysteries of the knowledge graph technology, the state-of-the-art progress, and its relationship with related disciplines. Finally, five major research challenges in this area are summarized, and the corresponding key research issues are highlighted.
Robust Influence Blocking Maximization in Social Networks
2016, 53(3):  601-610.  doi:10.7544/issn1000-1239.2016.20148341
Asbtract ( 1075 )   HTML ( 2)   PDF (2408KB) ( 745 )  
Related Articles | Metrics
Influence blocking maximization is currently a problem of great interest in the research area of social networks. Existing influence blocking methods assume negative influence sources are definitely known or non-adversarial. However, in real applications, it is hard to obtain the accurate information of influence sources. In addition, adversarial spreader may strategically select the seeds to maximize negative spread. In this paper, we aim to tackle the problems of influence blocking maximization with uncertain and strategic negative influence sources respectively. First, in order to increase the efficiency of the algorithms for mining blocking seeds, we discuss approximate estimation of the influence spread of negative seeds under the competitive linear threshold model. Based on the estimation, we propose a blocking seeds mining algorithm under the situation of finite uncertain and negative influence sources to maximize the expected influence blocking utility. In adversarial influence spread situations, one entity strategically attempts to maximize negative influence spread while the other one tries to block the negative propagation of its competing entity as much as possible by selecting a number of seed nodes that could initiate its own influence propagation. We model the interaction as a game and propose a polynomial time approximate algorithm for solving random blocking strategy based on min-max principle to decrease the worst negative influence spread. Finally, we make experiments on real data sets of social networks to verify the feasibility and scalability of the proposed algorithms.
Recognition of Abnormal Behavior Based on Data of Public Opinion on the Web
2016, 53(3):  611-620.  doi:10.7544/issn1000-1239.2016.20150746
Asbtract ( 1111 )   HTML ( 6)   PDF (2681KB) ( 1181 )  
Related Articles | Metrics
With the increasing popularity of the social network, public awareness and participation to hot topics has been much improved, mobile terminal equipment and fast Internet access make the spread of public opinion quickly. Public opinion on the Web has freedom, interactivity, diversity, deviation and burstiness as characteristics, has become an important factor that affects social stability. Therefore, how to timely detect, control and guide the development of public opinion is of great significance to the social stability. This article focuses on the behaviors that spread on the Web and contain “destruction”, “dangerous” and “loss” involves public security or judicial justice, and the behaviors is defined as abnormal behavior. We define the types of abnormal behavior that this article focuses on are aggression, injury, death, and arrests, four categories. From the point of view of information extraction, our method recognizes the abnormal behavior by identifying sentences that contain the abnormal behavior and constructs co-occurrence network of abnormal behavior, with provide the visualization analysis approach of public opinion on the Web.
Human Motion Recognition Based on Triaxial Accelerometer
2016, 53(3):  621-631.  doi:10.7544/issn1000-1239.2016.20148159
Asbtract ( 1235 )   HTML ( 1)   PDF (3896KB) ( 2381 )  
Related Articles | Metrics
In this paper, a new method is presented to realize motion detection on a mobile device. The scheme can recognize the people’s motions state according to the acceleration data as long as they simply carry a mobile device with a build-in triaxial accelerometer. The features of the motion signal are extracted in frequency domain and time domain using the method of comprehensive analysis. To enhance the adaptability of the method, the algorithm of independent direction of mobile device algorithm has been applied. The 11 major components, which have greatest contribution to the motion detection, are selected from the 21 motion’s features by principal component analysis, so the input dimension is reduced and the computational complexity of time and space of the algorithm is decreased. Based on the analysis and synthetic comparison of various classification algorithm, the J48 decision tree is accepted. According to the characteristics of the people nature motion, the hidden Markov model is introduced to improve the detection accuracy. Experiments, with different person and different motion, show that the synthesis algorithm has good accuracy and adaptability, and the highest recognition rate achieves 96.13%.
A Classification Method Using Transferring Shared Subspace
2016, 53(3):  632-643.  doi:10.7544/issn1000-1239.2016.20148263
Asbtract ( 820 )   HTML ( 2)   PDF (3119KB) ( 452 )  
Related Articles | Metrics
Transfer learning algorithms have been proved efficiently in pattern classification filed. The characteristic of transfer learning is to better use one domain information to improve the classification performance in different but related domains. In order to effectively solve the classification problems with a few labeled and abundant unlabeled data coming from different but related domains, a new algorithm named transferring shared subspace support vector machine (TS\+3VM) is proposed in this paper. Firstly a shared subspace used as the common knowledge between source domain and target domain is built and then classical support vector machine method is introduced to the subspace for the labeled data, therefore the resulting classification model has the ability of transfer learning. Specifically, using the theory of transfer learning and the principal of large margin classifier, the proposed algorithm constructs a shared subspace between two domains by maximizing the joint probability distribution of the labeled and unlabeled data. Meanwhile, in order to fully consider the distribution of the few labeled data, the classification model is trained in the augmented feature space consisting of the original space and the shared subspace. Experimental results confirm the efficiency of the proposed method.
Complexity and Algorithm for Minimum Breakpoint Median from PQ-trees
2016, 53(3):  644-650.  doi:10.7544/issn1000-1239.2016.20148258
Asbtract ( 1072 )   HTML ( 2)   PDF (1137KB) ( 479 )  
Related Articles | Metrics
A PQ-tree is a tree-based data structure which can represent large sets of permutations. Although the uncertainty of complete ancestral genomes is known, the homologous species provide information to determine the relative locations of partial genomes, whereby the PQ-tree can be designed to efficiently store ancestral genomes. In evolutionary biology, phylogenetic trees represent the evolutionary relationships between species. In the construction of phylogenetic trees, the leaves of the trees represent current species whose genomes are annotated by permutations, and the internal nodes represent ancestral genomes whose genomes are annotated by PO-trees. In order to determine the evolutionary relationships between different species, it is necessary to quantify the distances between the known permutations and the permutations in the constructed PQ-trees. Based on the measurement of breakpoint distance, we investigate p-Minimum Breakpoint Median from the PQ-tree. Given a PQ-tree and the corresponding p permutations, the goal is to identify a permutation generated by the PQ-tree that holds the minimum breakpoint distances relative to the p permutations. Our study shows that when p≥2, the p-Minimum Breakpoint Median problem from PQ-tree is NP-complete. Moreover, when p=1, the problem is fixed parameters tractable. To solve the 1-Minimum Breakpoint Median from PQ-trees, we propose a Fixed-Parameter Tractable algorithm whose computation complexity is O(3\+Kn) where K is the optimized breakpoint distance.
Quaternion Exponent Moments Based Robust Color Image Watermarking
2016, 53(3):  651-665.  doi:10.7544/issn1000-1239.2016.20148177
Asbtract ( 836 )   HTML ( 4)   PDF (6227KB) ( 744 )  
Related Articles | Metrics
It is a challenging work to design a robust color image watermarking scheme against geometric distortions. Moments and moment invariants have become a powerful tool in robust image watermarking owing to their image description capability and geometric invariance property. However, the existing moment-based watermarking schemes were mainly designed for gray images but not for color images, and detection quality and robustness will be lowered when watermark is directly embedded into the luminance component or three color channels of color images. Furthermore, the imperceptibility of the embedded watermark is not well guaranteed. Based on algebra of quaternions and exponent moments theory, a new color image watermarking algorithm robust to geometric distortions is proposed in this paper. We firstly introduce the quaternion exponent moments to deal with the color images in a holistic manner, and it is shown that the quaternion exponent moments can be obtained from the conventional exponent moments of each channel. Then we provide a theoretical framework to construct a set of combined invariants with respect to geometric distortions (rotation, scaling, and translation). And finally, we present a new color image watermarking algorithm robust to geometric distortions, which is based on a set of quaternion exponent moments invariants. Experimental results show that the proposed color image watermarking is not only invisible and robust against common signals processing such as median filter, noise adding, and JPEG compression, but also robust against the geometric distortions.
A Meet-in-the-Middle Attack on 8-Round mCrypton-96
WangGaoli, GanNan
2016, 53(3):  666-673.  doi:10.7544/issn1000-1239.2016.20148270
Asbtract ( 655 )   HTML ( 1)   PDF (2009KB) ( 584 )  
Related Articles | Metrics
mCrypton is a lightweight block cipher introduced in Information Security Application 2006 by Lim and Korkishko. mCrypton-64/96/128 denote 3 versions of the cipher with 64/96/128 b keys respectively. In this paper, we apply the meet-in-the-middle (MITM) attack on 8-round mCrypton-96, which improves the best MITM attack result on mCrypton-96 by 1 round.When analyzing the security of block ciphers, using key relations to reduce the time complexity, memory complexity and data complexity is a common method. From the property of the key schedule of mCrypton-96, we know that each round key could calculate some information of the internal register by the algebraic structure of the key schedule, and some round keys could be deduced from the other round keys. By using the relationship of key schedule and the properties of S-box, we present a MITM attack on 8-round mCrypton-96 based on the 4-round distinguisher by adding 1 round on the top and 3 rounds at the bottom. The time, memory and data complexities of the attack are 2\+{93.5} encryptions, 2\+{47} B and 2\+{57} chosen plaintexts respectively. It is illustrated that mCrypton-96 not only has an efficient performance but also possesses strong security.
A Hash Function Algorithm Based on Variable Parameter Cascade Chaos
2016, 53(3):  674-681.  doi:10.7544/issn1000-1239.2016.20148155
Asbtract ( 676 )   HTML ( 1)   PDF (1369KB) ( 640 )  
Related Articles | Metrics
A Hash function algorithm based on variable parameter cascade chaos is put forward aiming at the possible risk on the letting out of cascade chaos key and the deficiency of present Hash function. That is the status variable of another chaos system as the parameter perturbation is introduced to a Hash function cascade driving system, and the safe variable parameter cascade chaos system is realized with the control of turbulence intensity. The Hash function composed in this way not only obeys the variable parameter characteristic of chaos rules, but also possesses the feature of crosstalk step by step between the cascade subsystems. It can effectively reduce the risk of short period behavior caused by the finite computer precision and digital quantization possible, and it has great significance to improve the complexity and strong collision resistance of the compression function’s internal structure. The experimental results show that compared with other chaotic Hash algorithm and SHA-3 algorithm, this algorithm has high sensitivity to initial conditions, nice chaos and diffusion ability, strong collision resistance, simple and flexible algorithm, and strong controllability of variable parameter system; and it has a favorable prospect in the field of chaos secure communication, digital signature, etc.
Optimal Resources Allocation Algorithm for Optional Redundancy and Monitoring Strategies
2016, 53(3):  682-696.  doi:10.7544/issn1000-1239.2016.20148204
Asbtract ( 662 )   HTML ( 0)   PDF (5223KB) ( 542 )  
Related Articles | Metrics
In big data environment, the use of optional redundancy and monitoring strategy in one system increases the usage of resource and causes state space expansion for optimal resources allocation model. The performance of existing evolutionary search algorithms should be improved for the solution space formed by both integer and non-integer variables. To improve the algorithm efficiency, a memetic algorithm based on triple element array is proposed on the analysis of search neighborhood. First of all, the impact of change of variables such as monitoring rate on the system reliability increase is analyzed and then changing-length neighbor generation method is proposed for monitoring rate on neighbor analysis. The neighbor generation method is also proposed for strategy options considering the relations between components. After that, local search operator is refined through the iterative search among components, which increases the search range while maintaining the local advantage of individuals. This operator is used for improving the whole framework of memetic algorithm. Experiment results indicates that this algorithm can be used to get the solution of strategy option of each component and the corresponding optimized parameters for multiple optional strategies. Compared with existing multi-strategy search algorithms, the improved memetic algorithm could get better resources allocation results under the same reliability constraint. The local search operator does not have great impact on the stability of the whole algorithm.
A Fast Approximation Algorithm for the General Resource Placement Problem in Cloud Computing Platform
2016, 53(3):  697-703.  doi:10.7544/issn1000-1239.2016.20148323
Asbtract ( 765 )   HTML ( 3)   PDF (1471KB) ( 621 )  
Related Articles | Metrics
There are regionally distributed demands for various resources in cloud based large-scale online services. Given fixed resource budget, the service providers need to decide where to place resources to satisfy massive demands from all regions, where demands are usually represented by mean value in given time span. However, in scenarios with a large number of resources, demands are dynamic and stochastic, considering fine-grained demands and adopting stochastic model will further improve resource utilization. Compared with mean demand-based algorithm, considering demand stochasticity in algorithm will increase resource utilization ratio, but also leads to high time complexity. The time complexity of optimal algorithm is linear to total amount of resources, thus may be inefficient when dealing with a large number of resources. Based on nonlinear programming theory, we propose Fast Resource Placement (FRP), an effective resource placement method of high efficiency. In the algorithm, optimal solution is represented by continuous functions of input, and we construct approximation functions to reduce the computation complexity. The preliminary experiments show that in scenarios with general settings, compared with optimal algorithm, FRP can reduce the computation time by three orders of magnitude, and can achieve 99% effect of optimal solution. Therefore, FRP can be used to schedule large number of resources efficiently in time-tense scheduling scenarios.
Smart Home Energy Optimization Based on Cognition of Wearable Devices Sensor Data
2016, 53(3):  704-715.  doi:10.7544/issn1000-1239.2016.20150762
Asbtract ( 1079 )   HTML ( 9)   PDF (3500KB) ( 617 )  
Related Articles | Metrics
As the extension of smart grid in demand side, smart home energy optimization is an important branch of smart home. Smart home energy optimization aims to optimally schedule the home appliances to satisfy the comfort requirements and save the electricity cost. However, the comfort requirements are closely related to the human behavior, which has great subjectivity and uncertainty. Thus profiling the comfort requirements is one of the challenging problems. This paper presents a smart home energy management method based on the sensors data of smart wearable devices, which contains the human behavior analysis; updates the comfort requirements through creating the mapping model between human behavior and the comfort requirements by neural network; establishes the system dynamic models; and the parameters are estimated by using the sensor network data. Finally, the smart home energy optimization is solved by model predictive control. Based on the proposed method, the smart home platform is set up and the smart home energy optimization systems are developed to support the smart phone. The experiment presents promising performance on electricity cost saving and comfort improvement in four scenarios of user behaviors.
Patterns of Cardiorespiratory Activity Associated with Five Basic Emotions
2016, 53(3):  716-725.  doi:10.7544/issn1000-1239.2016.20140743
Asbtract ( 1125 )   HTML ( 10)   PDF (1060KB) ( 910 )  
Related Articles | Metrics
Affective interaction is the inexorable trend of the development of natural interaction. Physiological computing provides a new approach to understand the physiological and emotional states of users. However, there is no scientific consensus on whether there exists a stable relation between emotional states and the physiological responses. The present study review the recent research on physiological responses of autonomic nervous system activity in emotion and addressed to investigate the profile of autonomic nervous responses during the experience of five basic emotions: sadness, happiness, fear, anger, surprise and neutral. ECG and respiratory activity of fourteen healthy volunteers was recorded with BIOPAC SYSTEM MP150 during their reading passages with five basic emotional tones and neutral tone. Twelve indexes computed off-line from ECG and respiratory activities were employed as dependent variables for statistic analysis. The results indicate that significant or marginal differences are detected between the neutral and four basic emotions, except for sadness. The physiological patterns of the five basic emotions are different. Therefore, these results provide the positive evidence for the notion that the distinct patterns of peripheral physiological activity are associated with different emotions. The findings also indicate that it is feasible and effective to recognize users’ affective states based on physiological response patterns of ECG and respiratory activities.