ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 November 2012, Volume 49 Issue 11
An Improved Multi-Label Lazy Learning Approach
Zhang Minling
2012, 49(11):  2271-2282. 
Asbtract ( 1201 )   HTML ( 1)   PDF (1410KB) ( 759 )  
Related Articles | Metrics
Multi-label learning deals with the problem where each example is represented by a single instance while associated with multiple class labels. A number of multi-label learning approaches have been proposed recently, among which multi-label lazy learning methods have shown to yield good generalization abilities. Existing multi-label learning algorithm based on lazy learning techniques does not address the correlations between different labels of each example, such that the performance of the algorithm could be negatively influenced. In this paper, an improved multi-label lazy learning approach named IMLLA is proposed. Given a test example, IMLLA works by firstly identifying its neighboring instances in the training set for each possible class. After that, a label counting vector is generated from those neighboring instances and fed to the trained linear classifiers. In this way, information embedded in other classes is involved in the process of predicting the label of each class, so that the inter-label relationships of each example are appropriately addressed. Experiments are conducted on several synthetic data sets and two benchmark real-world data sets regarding natural scene classification and yeast gene functional analysis. Experimental results show that the performance of IMLLA is superior to other well-established multi-label learning algorithms, including one of the state-of-the-art lazy-style multi-label leaner.
A Mid-Perpendicular Hyperplane Similarity Criterion Based on Pairwise Constraints
Gao Shan, Zu Chen, and Zhang Daoqiang
2012, 49(11):  2283-2288. 
Asbtract ( 608 )   HTML ( 0)   PDF (1330KB) ( 460 )  
Related Articles | Metrics
Measuring the similarity between data objects is one of the primary tasks for distance-based techniques in data mining and machine learning, e.g., distance-based clustering or classification. For a certain problem, using proper similarity measurement will make it easier to be solved. Recently, more and more researches have shown that pairwise constraints can help to obtain a good similarity measurement for certain problem with significantly improved performances. Most existing works on similarity measurement with pairwise constraints are on distance metric learning, which use pairwise constraints to learn a distance matrix for subsequent classification or clustering. In this paper, inspired by the hyperplance used in nearest neighbor and support vector machine classifiers, we propose a new similarity measurement criterion called mid-perpendicular hyperplane similarity (MPHS) which can effectively learn from pairwise constraints, especially cannot-link constraints. Then we apply it for clustering and classification tasks. Finally, we validate the effectiveness of our proposed method by comparing it with several state-of-the-art algorithms through extensive experiments on a number of benchmark datasets.
Large-Scale Image Annotation via Random Forest Based Label Propagation
She Qiaoqiao, Yu Yang, Jiang Yuan, and Zhou Zhihua
2012, 49(11):  2289-2295. 
Asbtract ( 852 )   HTML ( 2)   PDF (1890KB) ( 670 )  
Related Articles | Metrics
Image annotation plays an important role in content-based image retrieval. Since annotating images is expensive, researchers have proposed many methods exploiting the large amount of unlabeled data to improve the performance of classifiers. Among those methods, label propagation has been proven to be effective in many applications. With the proliferation of digital photography, the amount of images is increasing at a very high speed, and however, existing label propagation approaches cannot tackle with real-world large-scale problems because they need to construct graph structures of instances. In this paper, we propose a novel large-scale algorithm for image annotation, called RFLP, which combines the strengths of random forest and label propagation. The reason why to use random forest is that it shows good performance on scalability and generalization, and based on the locality of decision trees, the large-scale data can be compressed. At first, it reduces the large-scale problem to small-scale by random decision trees. Then a traditional label propagation approach can propagate labels on the compressed data quite efficiently. And after that, it spreads the propagation results to all the unlabeled instances using random forest. Experimental results show that, compared with traditional label propagation methods, the proposed RFLP is effective and significantly cost-saving.
LDA-CRF: Object Detection Based on Graphical Model
Guo Qiaojin, Li Ning, Yang Yubin, and Wu Gangshan
2012, 49(11):  2296-2304. 
Asbtract ( 1093 )   HTML ( 1)   PDF (2102KB) ( 755 )  
Related Articles | Metrics
Object detection and recognition is actively studied in computer vision and machine learning. Particularly, in recently years, topic models such as latent Dirichlet allocation (LDA) has achieved great success in unsupervised recognition and localization of objects. However, LDA ignores the spatial relationships among image regions. To address this issue, conditional random field (CRF) introduces local dependence to improve the classification accuracy of image patches. In this paper, we propose a latent Dirichlet allocation-conditional random field (LDA-CRF) model by combining LDA with CRF. CRF is trained with topic features generated by LDA, while LDA generates topic information by utilizing structured class labels provided by CRF. Experimental results show that LDA-CRF performs better than CRF in object detection and recognition.
An Attribute Reduction Algorithm Based on Instance Selection
Wang Xizhao, Wang Tingting, and Zhai Junhai
2012, 49(11):  2305-2310. 
Asbtract ( 667 )   HTML ( 1)   PDF (664KB) ( 641 )  
Related Articles | Metrics
Computing reduction of attributes plays an essential role in the framework of supervised learning based on rough sets. Attribute reduction algorithm based on discernibility matrix is one of the commonly used attribute reduction algorithms. Given an information system, all reductions can be found by using this algorithm. However, this algorithm suffers from the main problems: large memory requirement and large response time needed. Especially, for a large database, it is the bottleneck to store the discernibility matrix. To tackle this problem effectively, an attribute reduction algorithm based on instance selection is proposed. The algorithm consists of three stages: firstly, the most informative instances are selected from the training set; secondly, the discernibility matrix is computed by using the selected instances; finally, all reductions can be found. The experimental results show that the proposed method can efficiently reduce the computational complexity both of time and space especially on large databases.
A Novel Graph Classification Approach Based on Embedding Sets
Wang Guijuan, Yin Jian , and Zhan Weixu
2012, 49(11):  2311-2319. 
Asbtract ( 603 )   HTML ( 0)   PDF (1555KB) ( 481 )  
Related Articles | Metrics
With the development of highly efficient graph data collection technology in many scientific application fields, classification of graph data becomes an important topic in the machine learning and data mining community. At present, many graph classification approaches have been proposed. Some of the graph classification approaches take three steps, which are mining frequent subgraphs, selecting feature subgraphs from mined frequent subgraphs, and constructing classification model by frequent subgraphs. Some other graph classification approaches take two steps, which are mining discriminative subgraphs directly from graph data and learning classification model by discriminative subgraphs. However, during mining frequent subgraphs or discriminative subgraphs, these approaches only take advantage of the structural information of the pattern, and do not consider the embedding information. In fact, in some efficient subgraph mining algorithms, the embedding information of a pattern can be maintained. We propose a graph classification approach, in which we employ a novel subgraph encoding approach with category label and adopt a feature subgraph selection strategy based on category information. Meanwhile, during mining frequent subgraphs, we make full use of embedding sets to select the feature subgraphs and by only one step we are able to generate classification rules. Experiment results show that the proposed approach is effective and feasible for classifying chemical compounds.
Subtraction Operation between Temporal Points with Granularities Based on Granularity Hierarchy Mapping
Zuo Yayao, Tang Yong, and Shu Zhongmei
2012, 49(11):  2320-2327. 
Asbtract ( 465 )   HTML ( 0)   PDF (1010KB) ( 586 )  
Related Articles | Metrics
Temporal modeling and calculus are the fundamental logic problems in temporal information system. The subtraction operation between two temporal points with any granularity is the basis of the temporal assertion, which is hard to deal with. From the point view of granular computing, the semantics and properties of temporal granularities are discussed according to the temporal partition. Temporal primitives are characterized based on the temporal granularities. Furthermore, the method of the granularity subtraction operation is proposed based on the mapping and conversion of different granularity hierarchies. The temporal points could be mapped to the countable set with any temporal granularity, and the subtraction operation between two temporal points could be converted to the subtraction operation between two natural numbers. The equality between temporal domain T and natural set N and continuity of the mapping are discussed and proved, which supports that the proposed method is correct, and the effect of the flexible temporal granularity is overcome. Finally, an algorithm for temporal granularity converting functions is proposed in the context of the calendar system. The proposed method could be adapted to the subtraction operation between two temporal points from any irregular temporal points set or userdefined temporal points set.
Low-Resolution Face Recognition in Semi-Paired and Semi-Supervised Scenario
Zhou Xudong, Chen Xiaohong,, and Chen Songcan,
2012, 49(11):  2328-2333. 
Asbtract ( 710 )   HTML ( 0)   PDF (950KB) ( 637 )  
Related Articles | Metrics
In the real environment, such as surveillance circumstances, there are a large number of low-resolution (LR) faces which are needed to be recognized. Compared with high-resolution (HR) face, LR has less discriminative details, so its recognition is more difficult. In order to improve the LR face recognition accuracy, the construction of LR face recognition system use not only the LR faces but also the HR faces corresponding to the LR faces in recent research. But there are two deficiencies in them: 1) HR faces and LR faces are required to be all paired; 2) the construction of face recognition system does not utilize any class information. Actually, it is the fact that HR faces and LR faces are always partially paired (semi-paried) and their class labels are partially known (semi-supervised). As a result, a semi-paired and semi-supervised algorithm for LR face recognition is developed to overcome the deficiencies of the relevant research. For the sake of utilizing the semi-paired and semi-supervised data more effectiviely, the implementation of the algorithm is divided into two stages. One stage is semi-paired learning and the other stage is semi-supervised learning. Promising experiments results on the Yale and AR face databases show the feasibility and effectiveness of the proposed method.
Formal Method Based on Petri Nets to Detect RFID Event
Sun Jinan, Huang Yu, Huang Shuzhi, Zhang Shikun, and Yuan Chongyi,
2012, 49(11):  2334-2343. 
Asbtract ( 763 )   HTML ( 0)   PDF (2204KB) ( 497 )  
Related Articles | Metrics
Radio frequency identification (RFID) provides fast collection of large volume of data and can be used to identify physical objects with unique IDs. In order to provide semantically meaningful data to different applications, RFID data need to be processed to discover user-defined complex events. The Petri-net-based method for the detection of complex events in RFID is proposed. A model named ED-net is introduced to specify semantics of complex events, which is also taken as the basis for the implementation of an event detector. Formal model ED-net is an extension of ordinary Petri net, providing user-defined types, functions and expressions, which are suitable for the precise description of attributes and constraints of RFID complex events, with non-temporal, temporal and parameterized constraints. Taking advantage of the token-flow mechanism of ED-net, we develop a step-by-step detection method that signals the occurrence of a complex event as soon as a corresponding place in ED-net is marked. Through modeling all the events to be detected in one ED-net model, multiple detections of common sub-events of different complex events are avoided. Finally, the experimental evaluations demonstrate the obvious reduction of average complex event detecting latency and CPU resource occupation; The efficiency and advantages of this detection method are verified.
Tag-TextRank: A Webpage Keyword Extraction Method Based on Tags
Li Peng, Wang Bin, Shi Zhiwei, Cui Yachao, and Li Hengxun
2012, 49(11):  2344-2351. 
Asbtract ( 1311 )   HTML ( 3)   PDF (1360KB) ( 831 )  
Related Articles | Metrics
Keyword extraction is to extract representative keywords from texts and has been widely used in most text processing applications. In this paper, we explore the use of tags for improving the performance of webpage keyword extraction task. Specifically, we first analyze the characteristics of bookmarking behavior and find that people usually use the same tags to label multiple topic-related webpages, which is shown by the fact that over 90% of labeled webpages can find relevant webpages through their tag information. Based on the discovery, we propose a method called Tag-TextRank. As an extension of the classic keyword extraction method TextRank, Tag-TextRank calculates the term importance based on a weighted term graph and the edge weight for a term pair is estimated by the statistics of the relevant documents which are introduced by a certain tag of the target webpage. The final importance score for a term is the combination of the above tag dependent importance scores. Tag-TextRank can measure the term relations by utilizing more documents so as to better estimate the term importance. Experimental results on a publicly available corpus show that Tag-TextRank outperforms TextRank on various metrics.
Combining Content and Link Analysis for Local Web Community Extraction
Zhang Xianchao, Xu Wen, Gao Liang, and Liang Wenxin
2012, 49(11):  2352-2358. 
Asbtract ( 476 )   HTML ( 0)   PDF (1214KB) ( 531 )  
Related Articles | Metrics
Most studies on Web community extraction only focus on pure link analysis, thus textual properties of Web pages that are interconnected via complex hyperlinks are neglected. An improved algorithm based on Flakes method using the maximum flow algorithm is proposed in this paper. Based on the fact that the more similar contents the two pages have, the more authority they exchange, the lexical similarity of Web pages is used for the assignment of edge capacities. In this paper, two methods, MT(Max-flow+TF-IDF) assignment and MTS(Max-flow+TF-IDF+Seeds) assignment are introduced. Furthermore, we also propose an efficient ranking scheme which strengthens differences between community members according to their content similarity to community topics. When choosing the highest nodes in our new method, the high quality of new labeled seeds is ensured by taking the lexical similarity between node and seeds into account. The experimental results indicate that the content-combined approach can effectively handle a variety of data sets on increasing the size and quality of the extracted community and rank community pages more reasonably.
Web Forum Thread Summarization Based on Dynamic Topic Modeling
Ren Zhaochun, Ma Jun, and Chen Zhumin
2012, 49(11):  2359-2367. 
Asbtract ( 672 )   HTML ( 0)   PDF (1891KB) ( 880 )  
Related Articles | Metrics
Because there is no effective document summarization method for Web forum threads currently, this paper proposes a Web forums thread summarization method based on a latent Dirichlet allocation (LDA) topic model. To handle the topic dependencies and the drifting problem, we consider the reply-relations among posts in topic modeling, and set the distribution of each topic as a dynamic process with the change of the thread discussion. We utilize the Gibbs EM algorithm to get parameters of the proposed dynamic topic model and determine the importance of each topic according to the sum of the topic weight over all sentences. Finally we calculate the scores of sentences based on probability distribution of topics and then generate the summarization of each thread. The experimental results on the two different forum data sets show that the new method outperforms several widely used summarization methods in terms of ROUGE metrics.
Empirical Study on Rare Query Categorization
Yao Ting, Zhang Min, Liu Yiqun, Ma Shaoping, and Ru Liyun
2012, 49(11):  2368-2375. 
Asbtract ( 681 )   HTML ( 0)   PDF (1643KB) ( 596 )  
Related Articles | Metrics
Rare queries are those users submit to search engines very infrequently. They occupy a large fraction of different queries and affect users experience greatly. But little work has been done on rare queries in existing user behavior analysis due to the data sparseness problem. In this paper we make an empirical study on characterizing user behaviors on rare queries and obtain an overview of rare query composition. Large scale search logs collected from a commercial search engine are used. Based on the analysis of several features involving behaviors in goal query, related queries and entire session, we propose a semi-supervised categorization framework and use a modified AdaBoost to classify rare sessions. The results are evaluated on 2 000 randomly sampled rare sessions and the average AUC value is over 83%. This work will be helpful for Web search study including user behavior analysis concerning rare queries.
Sentiment Classification Analysis Based on Extraction of Sentiment Key Sentence
Lin Zheng, Tan Songbo, and Cheng Xueqi
2012, 49(11):  2376-2382. 
Asbtract ( 638 )   HTML ( 4)   PDF (886KB) ( 598 )  
Related Articles | Metrics
A key problem of sentiment analysis is to determine the polarity of a review is positive (thumbs up) or negative (thumbs down). Unlike topic-based text classification, where a high accuracy can be achieved, the sentiment classification is a hard and complicated task. One of the main challenges for document-level sentiment classification is that not every part of the document is equally informative for inferring the polarity of the whole document. Thus, making a distinction between key sentences and trivial sentences will be helpful to improve the sentiment classification performance. We divide a document into key sentences and detailed sentences. Key sentence is usually brief but discriminative while detailed sentences are diverse and ambiguous. For key sentence extraction, our approach takes three attributes into account: sentiment attribute, position attribute and special words attribute. To make use of the discrepancy and complementarity of key sentences and detailed sentences, we incorporate key sentences and detailed sentences in supervised and semi-supervised learning. In supervised sentiment classification, a classifier combination approach is adopted because the original document is divided into two different and complementary parts; in semi-supervised sentiment classification, a co-training algorithm is proposed to incorporate unlabeled data for sentiment classification. Experimental results across eight domains show that our approach performs better than the baseline method and the key sentence extraction is effective.
Interface Integration of Deep Web Based on Ontology
Wang Ying, Zuo Xianglin, Zuo Wanli, and Wang Xin,
2012, 49(11):  2383-2394. 
Asbtract ( 597 )   HTML ( 3)   PDF (2612KB) ( 555 )  
Related Articles | Metrics
A significant amount of information in Deep Web can only be accessed through the query interface of a back-end database, instead of traversing static URL links. In order to access domain-specific databases simultaneously, it is important to construct an integration interface which allows uniform access to disparate relevant sources. Therefore, a novel method of interface integration based on ontology technique is proposed in this paper. It mainly subsumes two aspects: schema matching and schema merging. Schema matching is used to accurately identify the semantic correspondences among the attributes from different interfaces by exploiting the “bridge” effect of ontology, which can match many schemas and find all mapping relationships at one time. Schema merging is used to merge the source query interfaces to construct a unified schema based on the identified mapping relationships after schema matching, which should encompass all unique attributes features and sequences over the given set of interfaces as much as possible. Through a detailed experimental evaluation, it is indicated that the approach of interface integration based on ontology not only reduce the complexity of schema matching instead of finding pairwise-attribute correspondence in isolation, but also greatly improve the integration accuracy of interfaces. Therefore, the ontology-assisted interface integration approach is feasible and effective.
Blog Sentiment Orientation Analysis Based on Dependency Parsing
Feng Shi, Fu Yongchen, Yang Feng, Wang Daling, and Zhang Yifei,
2012, 49(11):  2395-2406. 
Asbtract ( 784 )   HTML ( 3)   PDF (2504KB) ( 727 )  
Related Articles | Metrics
Nowadays, blogs, which contain rich opinion and emotion information, have become an important platform for exchanging sentiments among netizens on the Web. Blog search provides a convenient way for these information exchanges. In many cases, users pay more attention to blogger’s opinions and sentiments about an event when searching the blogosphere, but most existing blog search engines return results based only on topics without considering sentiment orientations. In this paper, an algorithm called SOAD (sentiment orientation analysis based on syntactic dependency) is proposed for analyzing blogs’ sentiment orientation based on dependency parsing. A Chinese blog search engine prototype system is built based on the proposed algorithm, which reprocesses blog search results for sentiment analysis. Experiments show that, SOAD algorithm has more advantages, and the prototype system implements the purpose of this paper: the blog search results are returned according to blogger’s sentiment orientation.
Query Clustering Based on Query Requirements for Ranking
Hua Guichun, Zhang Min, Liu Yiqun, Ma Shaoping, and Ru Liyun
2012, 49(11):  2407-2413. 
Asbtract ( 444 )   HTML ( 0)   PDF (1431KB) ( 476 )  
Related Articles | Metrics
Ranking is an essential part of information retrieval. Nowadays there are hundreds of features for constructing ranking functions and it is a hot research topic that how to use these features to construct more efficient ranking functions. So learning to rank, an interdisciplinary field of information retrieval and machine learning, has attracted increasing attention. Queries could be classified into several types based on different criterions, and the importance of ranking features is divergent for different types of queries. It is unpractical to apply a general ranking function for all queries. In this paper, we analyse the query features based on keyword mathcing and constrcut quey feautre vectors through the selected query features. Then the queries are clustered into several clusters and ranking functions are learned for each cluster. Finally, the fittest ranking function is chosen for a new coming query and ranks the documents. The experimental results show that the ranking functions based on query clustering with selected query features are comparable with or even outperfom the one based on all queries.
Literal Tainting Method for Preventing Code Injection Attack in Web Application
Wang Yi, Li Zhoujun, and Guo Tao
2012, 49(11):  2414-2423. 
Asbtract ( 580 )   HTML ( 2)   PDF (1996KB) ( 540 )  
Related Articles | Metrics
Nearly every Web application faces the threat of code injection such as XSS(cross-site scripting) and SQL injection. This flaw occurs when a Web application takes the data originated from a user without validating or encoding the content, and makes malicious input run as part of database query or script in response Web page, which causes destruction of data integrity or user privacy leakage. In order to counteract this trend, we present a literal tainting method for Web application and argue that it is an efficient and easy-to-deploy solution for preventing such attacks. This approach involves hardening the server-side script with customizable security filtering policy for full prevention of code injection attacks. Although instrumentation to the Web application is needed, we will show that the process is fully automated and sound so that the approach is practical even for large Web applications. After preliminary experiments of several real world PHP applications with prototype tool PHPHard system implementing the techniques, we find that the literal tainting method can prevent XSS successfully by removing the evil script injection code. In comparison with the traditional taint propagation methods. It shows many advantages both in precision and effectivity while only causing fairly acceptable overhead.
An Improved Minimalist Mutual-Authentication Protocol
Zhang Xuejun, Cai Wenqi, Sun Zhixin, and Wang Suoping
2012, 49(11):  2424-2431. 
Asbtract ( 466 )   HTML ( 0)   PDF (1294KB) ( 474 )  
Related Articles | Metrics
An improved minimalist mutual-authentication protocol (IM2AP) is proposed by protecting the messages transmitted between the reader and tag. It achieves mutual authentication by the pseudonym and key shared between the tag and reader, and then they start to transmit the private messages. In order to ensure that the mutual authentication of the tag and reader will not be attacked by malicious interference, the tag generates a random number by the Hamming weight of the key which can be shared with the reader and then uses the random number to protect the transmitted message on the cyclic shift. Thus the attacker cant tamper with a particular bit and the protocol effectively avoids de-synchronization and the full disclosure attack. Analysis of security and performance shows that when the security of the lightweight security authentication protocols is generally weak, this protocol can improve the security and reliability of the system with limited cost, so it has a high practical value.
A Privacy-Preserving Data Publishing Method Based on Genetic Algorithm with Roulette Wheel
Hu Xinping, He Yuzhi, Ni Weiwei, and Zhang Yong
2012, 49(11):  2432-2439. 
Asbtract ( 758 )   HTML ( 0)   PDF (2099KB) ( 400 )  
Related Articles | Metrics
Privacy-preserving micro-data publishing for clustering is an important issue in data mining research, which aims at protecting privacy of individual data meanwhile accommodating enough clustering usability of the published data. Different from traditional distance-preserving and distribution-preserving solutions, a data perturbation method RWSGA (roulette wheel selection genetic algorithm) is proposed from the view of maintaining neighboring relation stability of the dataset during the obfuscation process in this paper. Roulette-wheel-selection-based genetic methods are adopted to make data obfuscation by building imitating relations between crossing, mutating and data perturbation. Firstly, the solution randomly chooses a pair of data points from the k neighborhood of a data point using roulette wheel strategy. Subsequently, tailored crossing or mutating operations are applied to the selected pair of data points to protect micro-data values from leakage, meanwhile guaranteeing stability of the corresponding k neighborhood. Furthermore,to avoid too large changes originated by mutating operations, an optimization is applied to improve the choice of mutating domain leveraging specifying centers of k nearest neighborhood from data space with higher density. Theoretical analysis and experimental results testify that RWSGA can modify published micro-data values greatly from their original correspondences and keep the clustering difference between the original dataset and the published dataset small.
A Kind of Safe Typed Memory Model for C-Like Languages
He Yanxiang, Wu Wei, Chen Yong, Li Qingan, and Liu Jianbo
2012, 49(11):  2440-2449. 
Asbtract ( 477 )   HTML ( 2)   PDF (1180KB) ( 525 )  
Related Articles | Metrics
Verifying program by formal method is an important way to ensure software trusted. For low level imperative language like C language which can operate memory directly, a suitable memory model is needed to formalize the operational semantics or axiomatic semantics. The traditional byte memory model can describe various memory operations very well, however, it cant guarantee safety and make program-verifying complex. The memory model of object-oriented language is high-level. It is suitable for program verification but not for describing low level memory operation. A safe typed memory model is proposed by combining the previous two kind models. It can be used to formalize semantics and to verify program based on Hoare logic. This memory model allows pointer arithmetic, structure assignment, type cast and other memory operations, and makes pointer-based programs verification easier. The memory model is formalized and verified using Coq proof assistant.
A Web Services Interaction Behavior-Environment Model Based on Generalized Stochastic Petri Nets
Zhu Jun, Guo Changguo, and Wu Quanyuan
2012, 49(11):  2450-2463. 
Asbtract ( 633 )   HTML ( 0)   PDF (4145KB) ( 412 )  
Related Articles | Metrics
The reliability and performance of individual Web service are related to the users locations, because service interaction behaviors can be easily influenced by the unpredictable Internet environment. By coordinating service interactions between different individual Web services, service composition is able to carry out much more complex functions. As a result, service composition is also greatly impacted by the Internet environment. However, current modeling methods of service composition focus on composition itself without the impact of the Internet environment. It causes great deviation between actual situation and evaluation results from such modeling methods. This paper introduces a Web services interaction behaviors-environment model based on generalized stochastic Petri nets. The modeling method takes into considerations not only the situations of individual Web services but also the factors of the Internet environment. The model is used to evaluate the reliability and performance of service compositions and predicate service interaction behaviors. Qualitative and quantitative analysis can also be performed by this modeling method, so as to give some guidance about optimization approaches to service composition plan and evaluate the impact of the Internet environment on service interaction behaviors. Finally, a case study is provided with full evaluations of the reliability, performance of service composition, and service interaction behavior properties.
Failure-Recovery-Strategies-Based QoS Estimation for Web Composition Transaction
Mei Xiaoyong, Li Shixian, Huang Changqin, and Zheng Xiaolin
2012, 49(11):  2464-2480. 
Asbtract ( 662 )   HTML ( 1)   PDF (5112KB) ( 445 )  
Related Articles | Metrics
To assure its transactional properties, composite Web service based applications require more transactional supports than traditional transactions, which has inevitably led to the need for appropriate failure recovery mechanisms that can provide sustainable and reliable execution semantics. In this paper, a comprehensive transaction recovery model is proposed, which includes forward recovery, backward recovery and alternative recovery. Failure recovery strategies are separated from business flow to dynamically model and program the failure recovery. With respect to the advanced aggregation patterns, since failed task will affect execution proceeding of Web composition transaction, and the existing QoS evaluation methods do not consider failure recovery, it can not be used directly in QoS estimation of Web composition transaction. According to the situations of both correct and failed, a QoS estimation model based on the performance of composition transactions is proposed. The results show that the proposed model can evaluate and analyze the performance of composition transactions effectively, reduce the failure rate of composition transaction, and eliminate the bad effects efficiently.
PMTree: An Efficient Pattern Matching Method for Event Stream Processing
Cheng Sujun, Wang Yongjian, Meng You, Cheng Zhendong, Luan Zhongzhi, and Qian Depei
2012, 49(11):  2481-2493. 
Asbtract ( 949 )   HTML ( 5)   PDF (3432KB) ( 544 )  
Related Articles | Metrics
Complex event processing technique focuses on analyzing and extracting the event sequence of the specific pattern from the continuous event streams. Under the high-throughput situations, how to recognize the event sequence quickly and accurately has become an important problem. The state-of-the-art pattern matching methods, i.e. NFA, Petri and DAG, have shortcomings in the expressive ability and high cost to support some requirements. To deal with this situation, we propose a tree-based pattern matching method PMTree. PMTree defines event model and corresponding event relation operator, maps event pattern to the specific nodes in PMTree, applies timepredicate constraints on these nodes, and at last joins them to build a PMTree. We study the optimization strategies in the tree construction which can reduce the pattern matching cost and search the optimal combination of tree nodes, providing a cost model and an optimization algorithm. Experiments show that PMTree is more efficient, compared with Esper, an open source complex event processing engine; in the same situation the processing speed can be 3—6 times faster than Esper, and its performance is stable under different situations, e.g. the number of events, the type of event sequence or the complexity of event sequence, etc.
Beacon Dynamic Movement Strategy in Node Localization for Wireless Sensor Networks
Liu Kezhong, Chen Weibo, Zhan Zhen, Zhang Jinfen, and Fu Qin
2012, 49(11):  2494-2500. 
Asbtract ( 621 )   HTML ( 0)   PDF (2269KB) ( 468 )  
Related Articles | Metrics
During the node localization based on mobile beacon in WSNs, the mobile paths of beacon directly affect the efficiency and accuracy of node localization. Considering the limitations of the communication and computational power about the network nodes, a revised virtual gravitation model for beacon movement in node localization is particularly proposed. In the model, the beacon selects its neighbors and gathers the parameters, such as the distance from effective nodes and the number of neighbor nodes and so on. On this basis, a virtual gravitation model for mobile beacon is built, which makes the beacon movements have higher adaptability. Furthermore, for the potential problems of zero gravitation and redundant traversal of these beacons, the model is revised with the method of adding near factors and curve fitting of the located area boundary. Simulation results show that, compared with the typical traditional path planning algorithms, the proposed method shortens the path length of beacon traversal network for 20% to 30%, and greatly saves the energy consumption.