ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 August 2010, Volume 47 Issue 8
Key-Postures Based Automated Construction of Motion Graph
Zong Dan, Li Chunpeng, Xia Shihong, and Wang Zhaoqi
2010, 47(8):  1321-1328. 
Asbtract ( 398 )   HTML ( 0)   PDF (1544KB) ( 441 )  
Related Articles | Metrics
STM graph (snap together motion graph), with high degree of polymerization nodes, is a structured graph to describe the relationship of the motion segments in character animation. Nodes in a motion graph serve as postures and edges between these nodes correspond to motion clips. Each node in an STM graph is connected with multiple edges. Many different approaches have been proposed to construct motion graphs from the existing motion capture data, which gives the user a flexible way to synthesize natural looking motion and control the character. So it becomes a hot topic to construct motion graphs automatically. However, the current methods of constructing STM graph depend largely on the experience and manual manipulations. Focusing on the problems mentioned above, a novel method to create motion graph automatically is proposed in this paper. Dimension reduction and nonparametric density estimation analysis are adopted to extract the key postures from motion capture data. The segments are obtained to construct the motion graph with high degree of polymerization nodes. The method not only improves the accuracy of extraction of key postures and reduces the subjective factor, but also improves the flexibility of controlling the virtual characters. Experiments have been done on taekwondo motion clip with 934 frames and badminton motion clip with 1798 frames. The results show the effectiveness of the method.
Image Thresholding Based on Relevance Vector Machine
Qiao Lishan, Chen Songcan, and Wang Min
2010, 47(8):  1329-1337. 
Asbtract ( 574 )   HTML ( 0)   PDF (2757KB) ( 440 )  
Related Articles | Metrics
Image thresholding is a simple but effective segmentation approach which is widely applied in image analysis, pattern recognition and computer vision. To obtain the optimal threshold value or more threshold values, classical thresholding methods have to search the whole gray levels of a given image, based on specific optimization strategy. Recently, a novel multi-threshold technique based on support vector regression (SVR) is developed, where threshold values can be determined directly from support vectors (SV). However: 1.There are no reliable methods to select threshold values from abundant SVs; 2.The computational cost is generally high due to the cross-validation procedure to estimate the hyper-parameters incurred in SVR; 3.It does not well deal with the images with unimodal histogram. Motivated by relevance vector machine (RVM), a new method called RVM-based threholding algorithm is introduced. Under the proposed framework, the relevance vectors (RVs) are very sparse and thereby the thresholds can be easily found from a few candidate RVs. In addition, the unimodal case can also be handled well. That is, the proposed algorithm not only keeps the advantages of SVR-based thresholding method, but also solves its open problems. Experimental results validate the effectiveness, efficiency and adaptability.
The L\-2 Distances for Rational Surfaces Based on Matrix Representation of Degree Elevation
Huang Weixian and Wang Guojin
2010, 47(8):  1338-1345. 
Asbtract ( 373 )   HTML ( 2)   PDF (698KB) ( 448 )  
Related Articles | Metrics
Computing the distance between curves (surfaces) is an important subject in the computer aided geometric design and the geometric approximation. For example, when estimating the errors for the approximation of rational curves (surfaces) by degree reduction or polynomial curves (surfaces), the distance between the original curves (surfaces) and the approximating curves (surfaces) need to be calculated by an efficient way. In order to give a uniform measure for the distance between curves (surfaces), the L\-2 distance between rational curves (surfaces) based on matrix representation of degree elevation is detailedly studied. Firstly, the L\-2 distance for two rational Bézier surfaces which is based on the degree elevation is presented. Then, using the L\-2 distance and the least-squares method, a clear and uniform measure for errors in the polynomial approximation of rational Bézier surfaces is derived. What is more, based on the conversion between Bézier bases and B-spline bases, the L\-2 distance for rational Bézier surfaces is generalized to that for rational B-spline surfaces. All the formulas for the L\-2 distance between curves and surfaces in this paper are presented through matrix operations, which is convenient for computer programs, so they are applicable and useful in practice. Finally, several examples are presented.
A Blind Watermarking Algorithm for 2D Engineering Graphics Based on Entity Substitution and Mean Value of Wavelet Coefficients
Peng Fei, Huang Xiaofeng , Deng Tao, and Long Min
2010, 47(8):  1346-1353. 
Asbtract ( 459 )   HTML ( 1)   PDF (1158KB) ( 420 )  
Related Articles | Metrics
A blind watermarking algorithm for 2D engineering graphics based on entity substitution and mean value of wavelet coefficients is proposed. Firstly, each line entity of the 2D engineering graphic is substituted by a polyline with k+1 vertices, and the substitution will not change the appearance of the 2D engineering graphic. Then the coordinates of the vertices in polylines are acquired to make some transformations to obtain the invariant abilities such as translation, scaling and rotation. After that, the coordinates of the transformed vertices are constructed into different transformation units. Finally, the discrete wavelet transformation is done to every transformation unit, and watermark is embedded to the mean value of wavelet coefficients of each transformation unit. Meanwhile, chaotic encryption and chaotic decryption are also used in the watermark embedding and watermark extraction, respectively, to improve the security of the watermark algorithm. Experimental results and analysis show that the proposed algorithm is blind detection, and it has a good capability of imperceptibility and has a high efficiency. At the same time, it is robust to the attacks such as rotation, translation, even scaling and graphic addition. It may be a potential new method for the copyright protection for the 2D CAD engineering graphics.
A Survey of Physics-Based Human Motion Simulation
Xia Shihong, Wei Yi, and Wang Zhaoqi
2010, 47(8):  1354-1361. 
Asbtract ( 741 )   HTML ( 5)   PDF (767KB) ( 1266 )  
Related Articles | Metrics
Physics-based human motion simulation is a cross research field in robotics, biomechanics and computer science, whose research goal is to build a computable model for human movement such that the natural human motion satisfying the given constraints could be simulated in the virtual environment. Compared with the traditional human animation, the simulated human motion must obey the Newtons law of motion. And therefore physics-based human motion simulation can be used for not only the special effects generation but also the education and training of physical tasks. Firstly, the origin of human motion simulation and some typical applications are introduced. Secondly, the key technologies of human motion simulation are elaborated. In this part, the simplified geometry model and physics model of human used in human motion simulation is introduced. And then the current methods are classified into two types: the forward dynamics based human motion simulation and the inverse dynamics based human motion simulation. The development history and recent research of these two methods are introduced in detail. Finally, based on the simulation of interactive action between human and objects, the verification of the result of simulation and so on, some future directions in human motion simulation are discussed.
Approaches to Knowledge Reduction in Interval-Valued Information Systems
Zhang Nan, Miao Duoqian, Yue Xiaodong
2010, 47(8):  1362-1371. 
Asbtract ( 437 )   HTML ( 2)   PDF (817KB) ( 673 )  
Related Articles | Metrics
Knowledge reduction is an important problem of rough set theory. The notion of a reduction plays an essential role in analyzing an information system. A reduction is a minimum subset of attributes that can provide the same descriptive or classification ability as the full set of available attributes. In other words, attributes in a reduction are jointly sufficient and individually necessary in an information system. Original knowledge reduction in Pawlak’s rough set theory is mainly based on single-valued information systems. However, there always exist interval numbers in a lot of information systems in reality. Thus, knowledge reduction in interval-valued information systems has been a hot issue in the related area in recent years. Although some research works for knowledge reduction in interval-valued information systems have been done, there are still critical problems, such as redundancy of classification results and low accuracy of rough approximation to be dealt with. Aiming at the existing problems, the authors introduce the α-maximal consistent class into interval-valued information systems to improve the accuracy of knowledge classification and rough approximation. A novel measurement for the similarity degree of different interval numbers is proposed in this paper. Furthermore, to simplify knowledge representation, the authors present the methods for constructing the discernibility matrix and attribute reduction in interval-valued information systems or decision systems. Finally, the latent knowledge can be discovered by knowledge reduction.
Reduced Support Vector Clustering
Ling Ping, Wang Zhe, Zhou Chunguang, and Huang Lan
2010, 47(8):  1372-1381. 
Asbtract ( 735 )   HTML ( 0)   PDF (1416KB) ( 463 )  
Related Articles | Metrics
Although with multi applications in data mining, fault diagnosis, bioinformatics and other aspects, the popularity of support vector clustering (SVC) algorithm is affected by two shortcomings: expensive computation and poor performance. Focus on such two bottlenecks, a novel algorithm, reduced support vector clustering (RSVC), is proposed. RSVC shares the frame of SVC, but it consists of reduction strategy and the new labeling approach. Reduction strategy is designed according to Schrdinger equation; it extracts those data that are important to model development to form a qualified subset, and optimizes the objective on this subset. The resulting clustering model has little loss in quality while consuming less cost. The new labeling approach is based on geometric properties of feature space of Gauss kernel function; it detects clusters by clustering support vectors and other data respectively in a clear way. The geometric properties are verified to guarantee the validation of the new labeling approach. Theoretical analysis and empirical evidence demonstrate that RSVC overcomes the two bottlenecks well and has advantage over its peers in performance and efficiency. And RSVC also exhibits fine behaviors. It shows that RSVC can work as a friendly clustering method in more applications.
Study of Multi-Agent Trust Coalition Based on Self-Organization Evolution
Cheng Bailiang, Zeng Guosun, and Jie Anquan
2010, 47(8):  1382-1391. 
Asbtract ( 482 )   HTML ( 1)   PDF (1681KB) ( 450 )  
Related Articles | Metrics
Different from the existing game theory used in multi-agent coalition, this paper studies it from coalition trust. The trust degree of individual is built on history cooperating information and then the trust degree of coalition is built above it for trusting coalition. To finish more complex task, some small coalitions unite to a large coalition by coalition trust and competitive negotiation. This way makes trust run through the total process of evolvement of coalition with describing the self-organization of evolvement. In order to get stable coalition, the free competition and trust evaluation is used for distributing income among coalition. For the effective and fair distributing mechanism, the frame behavior in free competition will be eliminated by amending trust degree with the protection of private data after distributing income. Using trust, the produce structure and evolution process of coalition is described and the stable coalition is obtained by fair income distribution with private protection. The distributed cooperation model is built and the computing complex is reduced greatly by coalition trust with controllable venture income. Trust coalition will provide an effective guarantee for dynamic coalition.
A Transductive Multi-Label Classification Method for Weak Labeling
Kong Xiangnan, Li Ming, Jiang Yuan, and Zhou Zhihua
2010, 47(8):  1392-1399. 
Asbtract ( 653 )   HTML ( 0)   PDF (1248KB) ( 619 )  
Related Articles | Metrics
Multi-label learning deals with the problems when each object can be assigned to multiple categories simultaneously, which is ubiquitous in many real world applications, such as text classification, image scene classification and bioinformatics, etc. In traditional multi-label learning methods, classifiers are usually required to utilize a large amount of fully labeled training data in order to obtain good performances for multi-label classifications. However, in many real world tasks, obtaining partially labeled (weak labeled) training data is often much easier and costs less efforts than obtaining a large amount of fully labeled training data. To alleviate the assumption of large amount fully labeled training data used by traditional multi-label learning methods, the authors propose a new multi-label learning method for weak labeling (TML-WL). By reweighting the error functions on positive and negative labels of weak labeled data, TML-WL method can effectively utilize the weak labeled training data to replenish the missing labels. TML-WL method can also use the weak labeled training data to improve the classification performances on unlabeled data. Empirical studies on the real-world application of image scene classification show that the proposed method can significantly improve the performance of multi-label learning when the training data are weak labeled.
A Bi-Sparse Relational Learning Algorithm Based on Multiple Kernel Learning
Han Yanjun and Wang Jue
2010, 47(8):  1400-1406. 
Asbtract ( 484 )   HTML ( 0)   PDF (776KB) ( 456 )  
Related Articles | Metrics
Relational learning is becoming a focus in machine learning research. In relational learning, samples cannot be represented as vectors in R\+n. This characteristic distinguishes it from other machinelearning tasks in that relational learning cannot utilize the geometric structure in R\+n and thus is much harder to solve. In this paper a multiple kernel learning approach for relational learning is proposed. First of all, it is proved that for multiple kernel learning with the kernels induced by logical rules, it suffices to use the linear kernel. Based on this, through iteratively constructing rules by a modified FOIL algorithm and performing the corresponding multiple kernel optimization, the proposed approach realizes an additive model on the feature space induced by the obtained rules. The algorithm is characterized by its “bi-sparsity”, i.e., support rules and support vectors are obtained simultaneously. Moreover, it is proved that the multiple kernel learning in the feature space induced by rules is equivalent to squared \-1 SVM. The proposed algorithm is evaluated on six real world datasets from bioinformatics and chemoinformatics. Experimental results demonstrate that the approach has better prediction accuracy than previous approaches. Meanwhile, the output classifier has a straightforward interpretation and relies on a smaller number of rules.
A Classification Method for Class-Imbalanced Data and Its Application on Bioinformatics
Zou Quan, Guo Maozu, Liu Yang, and Wang Jun
2010, 47(8):  1407-1414. 
Asbtract ( 739 )   HTML ( 1)   PDF (811KB) ( 790 )  
Related Articles | Metrics
A classification method is proposed for class-imbalanced data, which is common in bioinformatics, such as identifying snoRNA, classifying microRNA precursors from pseudo ones, mining SNPs from EST sequences, etc. It is based on the main idea of ensemble learning. First, the big class set is divided randomly into several subsets equally, and it is made sure that every subset together with the small class set can make up a class-balanced training set. Then several different mechanism classifiers are selected and trained with these balanced training sets. After the multi-classifiers are built, they will vote for the last prediction when dealing with new samples. In the training phase, a strategy similar to AdaBoost is used. For each classifier, the samples will be added to the training sets of next two classifiers if they are misclassified. It is necessary to repeat modifying the training sets until a classifier can accurately predict its training set or reaching the maximum repeat times. This strategy can improve the performance of weak classifiers by voting. Experiments on five UCI data sets and three bioinformatics experiments mentioned above prove the performance of the method. Furthermore, a software program named LibID, which can be used as similarly as LibSVM, is developed for the researchers from bioinformatics and other fields.
Efficient Processing of X-Tuple Based Top-k Queries in Uncertain Database
Liu Dexi, Wan Changxuan, and Liu Xiping
2010, 47(8):  1415-1423. 
Asbtract ( 398 )   HTML ( 0)   PDF (1048KB) ( 462 )  
Related Articles | Metrics
Like top-k in traditional databases, top-k queries in uncertain databases are quite popular and useful due to its wide application usage. However, compared with top-k in traditional databases, queries over uncertain database are more complicated because of the existence of exponential possible worlds. Often, two kinds of generation rules are considered in the uncertain database: independent and mutually exclusive. An x-tuple is the union of the tuples mutually exclusive. U-kRanks queries consider each alternative in x-tuple as single one and return the tuple which has the highest probability appearing at top k or a given rank. However, no matter which alternative (tuple) of an x-tuple appears in a possible world, it is undoubtedly believed that this x-tuple appears in the same possible world accordingly. Thus, instead of ranking each individual tuple, the authors define a novel top-k query semantic in uncertain database, uncertain x-kRanks queries (U-x-kRanks), which return k x-tuples according to the score and the confidence of alternatives in x-tuples, respectively. In order to reduce the search space, they present an efficient algorithm to process U-x-kRanks queries, which can minimize the scan depth by terminating the scan process as soon as the answers are found. Comprehensive experiments on different data sets demonstrate the effectiveness of the proposed solutions.
Efficient Global Model-Checking for Propositional μ-Calculus
Jiang Hua
2010, 47(8):  1424-1433. 
Asbtract ( 558 )   HTML ( 1)   PDF (815KB) ( 450 )  
Related Articles | Metrics
Model-checking algorithms of the propositional μ-calculus fall into two categories: global and local. The state space explosion is the main problem that model-checking algorithms face. How to efficiently lower the time and space complexity of the algorithms is the research focus. Built upon the earlier work by Long, Browne, Jha and Marrero, and according to Tarskis fixpoint theorem, the process of calculating the alternating fixpoint expressions in the propositional μ-calculus, whose alternating nesting depth of fixpoint is 4, is studied carefully. Two groups of partial ordering relation between intermediate results have been found, and a global model-checking algorithm is designed for efficiently evaluating alternating fixpoint expressions in the propositional μ-calculus. This algorithm has a similar time complexity to the earlier work by Long, Browne, Jha and Marrero, i.e., O((2n+1)\+{[d/2]+1}) as opposed to O(n\+{[d/2]+1}), but it has much better space complexity, i.e., polynomial O(dn) as opposed to exponential O(n\+{[d/2]+1}), where n is the number of states in the transition system and d is the alternation depth in the formulas. The contribution is finding out the best global model-checking algorithm whose time complexity is O((2n+1)\+{d/2]+1}) and the space complexity is O(dn).
Optimizing OpenMP by Array Privatization on the Multi-Core Platform of IBM Cell
Li Jianjiang, Liu Zhenzhen, and Wang Jue
2010, 47(8):  1434-1441. 
Asbtract ( 557 )   HTML ( 0)   PDF (1002KB) ( 474 )  
Related Articles | Metrics
Array privatization is one of the keys to exploit parallelism, and SPMD is an important method for OpenMP array privatization. IBM Cell processor is a heterogeneous multi-core processor with a 64b multithreaded power PowerPC processor element (PPE) and eight synergistic processor elements (SPE) connected by an internal, high-bandwidths element interconnect bus (EIB). However, IBM XLC can’t compile SPMD style OpenMP successfully which is a compiler of IBM multi-core platform. In order to solve this problem, aiming at making good use of static buffer in local store to reduce DMA operations, a technique is presented for array privatization with regard to the implementation on the multi-core platform of IBM Cell. By developing some methods for optimization of OpenMP applications, a good use is made of static buffer in local store to reduce DMA operations that focuses on the privatization of being reused data, the balance of computation and communication. The technique includes array privatization analysis, array privatization translation, and involving eliminating barriers and non-blocking DMA operations which directs reused data to wider range. The translation of Jacobi code shows that it improves average three percent of the performance. Furthermore, it is more efficient to be applied to array privatization of small scale OpenMP computation on the multi-core platform of IBM Cell.
Analysis of Service Replaceability on Behavior Effect
Liu Ying, Zhang Yichuan, Zhang Bin, Zhang Mingwei, and Zhu Zhiliang
2010, 47(8):  1442-1449. 
Asbtract ( 420 )   HTML ( 0)   PDF (769KB) ( 501 )  
Related Articles | Metrics
Web service composition is a key issue in Web service research area. Service replacement refers to the problem of identifying a service that can replace another service in the context of a composition with a specified functionality, and the analysis of service replaceability is an important issue for supporting seamless service integration and collaboration. It is known that performing formal analyses to verify the consistency of the composition effect, before and after replacement, is helpful for service dynamic composition. Existing solutions to this problem rely on analyzing service replaceability of the service, according to both the consistency of operations and the operation sequence, but decrease the space of replaceable services. Consequently, the authors propose a determination method of service replaceability on service effect. They use colored Petri nets as formalism foundation modeling and specifying Web services and reasoning on behavioral features of Web services composition. The whole work is achieved by modelling the service behavior using colored Petri nets, defining the trigger condition and result as a behavior effect, introducing the concept of service behavior consistency, and listing the algorithms to implement the process. The theorem presented and proved illustrates that the analysis approach is very useful for correctly building and reliably executing the composing service after service replacement.
VSR: Data Gathering in Opportunistic Mobile Sensor Network with Multiple Sinks
Xiong Yongping, Sun Limin, Ma Jian, and Niu Jianwei5
2010, 47(8):  1450-1458. 
Asbtract ( 406 )   HTML ( 2)   PDF (2000KB) ( 479 )  
Related Articles | Metrics
Opportunistic mobile sensor networks can be applied in some scenarios such as wild animal monitoring and urban sensing utilizing sensors embedded into handheld devices to collect urban information. In these applications, the gathered data usually need to be transmitted from the source node to one of multiple base stations (sink nodes). The authors propose the VSR (virtual space-based routing) scheme adapting “store-carry-forward” paradigm to transmit the data message to the base stations. In VSR, each sensor estimates the expected delivery delay to all base stations based on the meeting history between the node and the base stations. Then, each sensor is mapped into a coordinate point of a high dimensional space according to its delivery delays to all sinks. All sink nodes are corresponding to the origin of the space. The forwarding metric is defined as the Euclidean distance of the node to the origin. When two nodes encounter, the node with the higher such metric forwards the carried messages to the peer with the lower metric, until the messages are delivered to any sink nodes. VSR is robust to dynamic network because of its fine-grained forwarding decision and is appropriate for sensor node due to its low computing and storage overhead. Experiments under two random scenarios show that VSR outperforms the history-based routing proposed in ZebraNet and random forwarding scheme.
An Unequal Clustering Algorithm in Wireless Sensor Networks Based on Traffic Prediction
Li Jie and Han Zhijie
2010, 47(8):  1459-1465. 
Asbtract ( 435 )   HTML ( 0)   PDF (1359KB) ( 377 )  
Related Articles | Metrics
Wireless sensor networks are composed of a large number of sensor nodes that are densely deployed inside the phenomenon. An unequal cluster of routing algorithm is proposed based on divided virtual region for “hot spot” in clustering routing protocol with many jumps in the wireless sensor network. The routing algorithm gives the task of dividing clusters to the sink node with non-limited energy, such that the scale of inside cluster layer near the sink node is smaller than that of outer layer cluster. To realize the distributed election work of cluster head and avoid energy consumption in each wheel, main and vice cluster head is taken into the structure of clusters. The Markov forecast model is used to the replacement process of main cluster head. Thus, both the death of main cluster head due to exhausting energy completely and the network division because of the death are avoided. The life of the network is also reduced. Using the NS2.26 platform to simulate the proposed algorithm, the result indicates that compared with the traditional routing algorithm, this routing algorithm can prolong the life of WSN and enlarge the survival node number of WSN.
MEANS: A Micro-thrEad Architecture for Network Servers
Gong Yili, Lei Yingchun, Zhang Wen, Wu Chanle, and Zhang Guoan
2010, 47(8):  1466-1480. 
Asbtract ( 511 )   HTML ( 4)   PDF (2023KB) ( 526 )  
Related Articles | Metrics
Internet applications require high-performance network server architecture. The authors propose a software architecture for network servers, MEANS (micro-thread architecture for network server), which aims at supporting Internet applications. By introducing a new thread abstract, micro-thread, MEANS upwardly provides a multi-micro-thread environment to programmers, and downwardly accesses the OS services concurrently with traditional threads. MEANS adopts the event-driven mechanism to manage and schedule the micro-threads, which takes advantages of both multi-threaded and event-driven architecture. Moreover, MEANS is general purpose, scalable, robust and adaptable. By preliminary evaluation, in the terms of the concurrence policy, MEANS is similar to the event-driven architecture and outperforms the multi-threaded architecture in I/O accessing. In particular, when the accessed file set is stored in the memory, the throughput of Hammer(2) server based on MEANS is 37.22% more than Apache based on multi-thread architecture, and 34.88% more than LightTPD based on single thread event-driven architecture, and 35.56% more than flash based on AMPED architecture, and is equal with Haboob based on SEDA architecture. While the accessed file set is stored on the hard disk, Hammer(2) increases the throughput by 191.35% compared with Apache, and 131.3% compared with LightTPD, and 904.16% compared with flash, and 45.6% compared with Haboob.
A Hard/Soft Integrated Control Flow Checking Method at Assembly Level
Wu Yanxia, Gu Guochang, Dai Kui, Liu Haibo, and Shen Jing
2010, 47(8):  1481-1489. 
Asbtract ( 451 )   HTML ( 1)   PDF (890KB) ( 381 )  
Related Articles | Metrics
The control flow checking (CFC) technique is one of the effective ways to avoid programs down caused by transient fault. The control flow checking by software signatures (CFCSS) based on assemble code which has been tested on ARGOS satellite, has not only the relatively high error detecting ability and lower cost in redundant instructions, but also more practicability. However, there are still some detection loopholes and mistaken checking in this algorithm. Firstly the detection loopholes and mistaken checking are explained in CFCSS. Then according to the character of assemble, an improved CFCSS algorithm is presented, which modifies the basicblock selecting method and the insert position of multiadjusting signature value assign instructions. Furthermore, In order to improve the detecting ability and reduce the cost in redundant instructions on the basis of the improved algorithm, an improved CFCSS using hardware/software (ICFCSSHS) algorithm is presented. In this algorithm, only the information about signature is generated in compiler and the control flow checking hardware units are triggered by judging the instruction types in the ID stage. Experimental results show that the redundant code memory overhead of this algorithm is 21.5% lower than that of CFCSS algorithm, and the undetected error rate is only 1.5%. It has strong practical value.
A Log-Based Task Model Inference Tool and Its Applications
Gao Chongnan, Yu Hongliang, and Zheng Weimin
2010, 47(8):  1490-1496. 
Asbtract ( 386 )   HTML ( 1)   PDF (1166KB) ( 424 )  
Related Articles | Metrics
Distributed systems are the enabler of todays Internet services. These systems usually involve multiple tiers, a hierarchy of functional abstractions, and high degrees of concurrency. A user request is processed as a task in a series of stages across multiple machines, processors and threads. Verifying the behavior of such individual tasks therefore becomes a very challenging problem since developers have to reconstruct the task flow by linking together pieces of its execution throughout the system. Understanding system runtime behavior is a key to verify system design and debug system logic and performance problems. Existing works extract system runtime as causal paths, but they require either manual annotation or developer-provided execution structures. Manual annotation is tedious and prone to error when system design is complex and code base is evolving rapidly. Providing execution structures requires deep understanding of system design and this approach is limited to core designers or developers. In this paper, the authors try to infer hierarchical task models from system logs. They first extract task information from logs and infer the hierarchical relationships among tasks. Then the task hierarchies are combined into hierarchical task models. Experiences show that the inferred task models can help both understand system design and debug performance problems.