ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 January 2005, Volume 42 Issue 1
   Next Issue
Formal Verification Techniques in Workflow Process Modeling
Zhou Jiantao, Shi Meilin, and Ye Xinming
2005, 42(1):  1-9. 
Asbtract ( 708 )   HTML ( 7)   PDF (510KB) ( 816 )  
Related Articles | Metrics
Workflow process modeling is a complicated and error-prone procedure. It is highly noted in both research and industrial area that the cost for modifying errors is very high after a process definition becomes operational. Thus, effective process verification in modeling phase is very essential. This paper summarizes the state of the art of workflow process verification. Firstly, importance of verification is emphasized, then depiction of problems needed to be verified and their complexities are stated, next the requirements for verification methods are described. After introduction of these essential problems, two sorts of important verification techniques are discussed in detail. The first one is soundness verification, whose research results on Petri net model are presented. The second one is reduction verification, whose research results on Petri net and workflow graph are given. Besides, other verification methods, including correctness verification and those methods based on LTS or UML, etc. are simply mentioned. By analysis and comparison, the problems may be researched and future trends are finally presented.
VAR-Tree—A New High-Dimensional Data Index Structure
Dong Daoguo, Liang Liuhong, and Xue Xiangyang
2005, 42(1):  10-17. 
Asbtract ( 749 )   HTML ( 1)   PDF (432KB) ( 787 )  
Related Articles | Metrics
K nearest neighbor (KNN) search is a very challenging research topic in many application areas, such as multimedia information retrieval and data mining. Lots of index structures have been proposed to solve this problem. However, the query performance based on tree-like index structures would decrease drastically with the increase of dimensionality. As a result, ‘the curse of dimensionality’ is brought about and well known in many index structures like R-Tree, X-Tree and SS-Tree, etc.. Researchers also proposed other methods like VA-File to reduce disk I/O cost by compressing data. However, approximate vectors in VA-File are not sorted or classified hierarchically. In this paper, a new index structure VAR-Tree is proposed, which combines VA-File and R-Tree, and employs R-Tree to manage the approximations. A KNN search algorithm is also presented to perform similarity search in a VAR-Tree. Experimental results show that VAR-Tree has a promising retrieval performance.
A Concurrency Control Protocol for Scheduling Mixed Real-Time Transactions
Wang Qiang, Wang Hongan, Jin Hong, Fu Yong, and Dai Guozhong
2005, 42(1):  18-29. 
Asbtract ( 453 )   HTML ( 0)   PDF (654KB) ( 506 )  
Related Articles | Metrics
Previous study of real-time databases mainly focuses on scheduling and concurrency control of single type real-time transactions. However, there is a very common demand on processing mixed transactions. This paper first introduces a two-level real-time database system model, which provides the support of using aperiodic task scheduling algorithms to improve the system performance. Next, a new real-time concurrency control protocol called mixed concurrency control with dynamic adjustment of serialization order using timestamp intervals (MCC-DATI) is proposed to ensure the data consistency among mixed transactions. The schedulability of hard real-time transactions can be guaranteed by adopting dynamic priority driven scheduling algorithm and bounding the blocking time from aperiodic soft real-time transactions, and the deadline miss ratio of soft real-time transactions may be reduced by adopting aperiodic task scheduling algorithm and dynamic adjustment of serialization order using timestamp intervals. Simulation experiments show that the MCC-DATI protocol can improve the system performance under different workloads and deadline constraints, as compared with previous concurrency control protocols of mixed transactions.
DTD-Based XML Indexing
Lu Yan, Zhang Liang, Duan Qiyang, and Shi Baile
2005, 42(1):  30-37. 
Asbtract ( 501 )   HTML ( 0)   PDF (429KB) ( 570 )  
Related Articles | Metrics
Path expression is a common feature of XML query languages. Many indexing methods have been proposed. DTD can be of great help in XML indexing, but most techniques available now are generic to XML documents that are completely schema-less. Proposed in this paper is DBXI, a new method that takes advantage of information embedded in DTD for speeding up the process of XML path query. DBXI adopts a new coding scheme. With the help of DBXI, a path expression with N elements/attributes and one predicate restriction needs only zero or one structural join operations per XML document. For a path expression that does not match with any paths in XML documents, DBXI can give a judgment of no answer in much shorter time than those of indexing methods in existence. Experimental results demonstrate that DBXI can process path queries faster than Lore, SphinX and XISS do.
LDPChecker—A Model Checking Tool for Real-Time and Hybrid System
Pei Yu, Li Xuandong, and Zheng Guoliang
2005, 42(1):  38-46. 
Asbtract ( 625 )   HTML ( 0)   PDF (504KB) ( 547 )  
Related Articles | Metrics
Hybrid systems are real-time systems that allow both continuous state changes over time periods of positive durations and discrete state changes in zero time. Being an important subclass of hybrid systems, linear hybrid systems are usually modeled using linear hybrid automata. Linear hybrid automata are undecidable in general, but for a subclass of linear hybrid automata called positive loop-closed automata, the satisfaction problem for linear duration properties can be solved by linear programming. To support automatic checking of linear hybrid automata for linear duration properties, a tool named LDPChecker implementing the checking algorithm has been developed. The tool LDPChecker can identify positive loop-closed automaton and perform checking on it. Its key features includes its ability to check real-time and hybrid system for many real-time properties including reachability property, and to generate diagnostic information automatically.
A Weight Adjustment Technique with Feature Weight Function Named TEF-WA in Text Categorization
Tang Huanling, Sun Jiantao, and Lu Yuchang
2005, 42(1):  47-53. 
Asbtract ( 645 )   HTML ( 0)   PDF (397KB) ( 643 )  
Related Articles | Metrics
Text categorization (TC) is an important research direction in Text Mining. It aims to assign one or more predefined category label(s) for a text document, and provides efficient methods for documents management and information searching. A major problem in automatic text categorization is how to select the best feature subset from the original high feature space in order to make the categorization algorithm work efficiently and improve the precision. In this paper, the methods of feature selection and weight adjustment techniques are discussed and analyzed, and their influence on text classification precision and efficiency is pointed out. Furthermore, the TEF-WA (term evaluation function-weight adjustment) is introduced. We introduce a new weight function, which includes feature weight evaluation function and adjusts the effect of the feature term in the classifier according to the feature term's strength. To evaluate the TEF-WA method, experiments are carried by using several different scale training document collection, various term evaluation functions such as document frequency, information gain, expected cross entropy, CHI, the weight of evidence for text, term frequency formula or document frequency formula. The experiment results have proved that the TEF-WA technique is efficient in promoting the classification precision and reducing the compute complexity.
Belief Characteristic's Research of BDI Model
Liu Yong, Pu Shuzhen, Cheng Daijie, and Cao Zehan
2005, 42(1):  54-59. 
Asbtract ( 878 )   HTML ( 1)   PDF (325KB) ( 500 )  
Related Articles | Metrics
Typical model of agent computing is BDI (belief, desire and intention), in which the belief is an important attribute. In this paper, the belief is extended into knowledge belief and achieving belief. Knowledge belief stands for the knowledge an agent has, which has the characteristics of knowledge evolving and inheriting. Achievement belief stands for an event which is not achieved now but will be achieved in future. It is the personality or goal which is awared by agent. Describing these two kinds of belives by the modal logic formalizing tools in non-standard world, the reachable relationship in possible world can be described as the different phases of cognition and of belief realization. This avoids the logical omniscience problem and the logical side effect problem. Knowledge belief satisfies the KDT4 axioms, and achievement belief satisfies KD axioms. These two kind of believes can be used in mental state and mental model describing.
Improvement of Local Research in SAT Problem
Yang Jinji, and Su Kaile,
2005, 42(1):  60-65. 
Asbtract ( 641 )   HTML ( 0)   PDF (337KB) ( 454 )  
Related Articles | Metrics
Recently, local search methods for solving the SAT problem have attracted considerable attention because they are the best known approaches to several hard classes of satisfiability. In this paper, an efficient initial probability is presented, which can constrain the random initial assignment of variables in the local search method. Also discussed is how to use the initial probability to improve the current local search algorithms such as WSAT, NSAT, TSAT, SDF, etc.. The strategies are straightforward and intuitive, and are helpful for increasing the number of satisfied clauses at its beginning of local search, thus decreasing the number of flipping for finding a solution and making the local search process converge fast. For this method to be compared with the other current local algorithms, many general random 3-SAT problem instances, structured SAT problem instances in SATLIB and phase transition's SAT problem instances are tested. The actual experimental results indicate that the improved algorithms are more efficient than the current local search SAT solvers such as WSAT, and so on. Moreover, this method can also be applied to other related methods to improve their efficiency.
Identification of Taste Signals of Tea Based on Minimal Uncertainty Neural Networks
Wang Yan, Zhou Chunguang, Huang Yanxin, and Feng Xiaoyue
2005, 42(1):  66-71. 
Asbtract ( 458 )   HTML ( 0)   PDF (390KB) ( 448 )  
Related Articles | Metrics
It is well known that determining the structure and training the parameters of neural networks efficiently are difficult in the field of neural networks research. These years it has been somewhat successful to construct neural networks in light of Bayesian theorem and to optimize neural networks according to particle swarm optimization respectively. A novel model of taste signals recognition based on minimal uncertainty neural networks is proposed in this paper. The model adopts minimization uncertainty adjudgment to construct the networks structure, and uses Bayesian theorem and particle swarm optimization (PSO) to determine the parameters of the networks rapidly and efficiently. The identification of the taste signals of 10 kinds of tea is successful in utilization of this model. The experimental results show the feasibility and probability of introducing the proposed model to the identification of taste signals of tea. Section 2 presents the model of minimal uncertainty neural networks (MUNN). How to determine the weights and biases of MUNN by Bayesian theorem, PSO, and the hybrid of them are illustrated respectively in section 3. The experimental results are presented and discussed in section 4. Conclusions are in section 5.
Data Mining Based on Segmented Time Warping Distance in Time Series Database
Xiao Hui and Hi Yunfa
2005, 42(1):  72-78. 
Asbtract ( 756 )   HTML ( 0)   PDF (472KB) ( 1294 )  
Related Articles | Metrics
Data mining in time series database is an important task, most research work are based comparing time series with Euclidean distance measure or its transformations. However Euclidean distance measure will change greatly when the compared time series move slightly along the time-axis. It's impossible to get satisfactory result when using Euclidean distance in many cases. Dynamic time warping distance is a good way to deal with these cases, but it's very difficult to compute which limits its application. In this paper, a novel method is proposed to avoid the drawback of Euclidean distance measure. It first divides time series into several line segments based on some feature points which are Chosen by some heuristic method. Each time series is converted into a segmented sequence, and then a new distance measure called feature points segmented time warping distance is defined based this segmentation. Compared with the classical dynamic time warping distance, this new method is much more fast in speed and almost no degrade in accuracy. Finally, implements two completed and detailed experiments to prove its superiority.
An Improved Model for Adaptive Text Information Filtering
Ma Liang, Chen Qunxiu, and Cai Lianhong
2005, 42(1):  79-84. 
Asbtract ( 430 )   HTML ( 0)   PDF (342KB) ( 611 )  
Related Articles | Metrics
The information filtering technology is usually used to track favorite topics and eliminate garbage content from information stream. The adaptive information filtering, which requires little initial training resource and can actively improve itself in filtering process, provides a better performance and convenience than the old way. But there are still some difficulties in training and adaptive learning. In this paper, an improved filtering model for adaptive text filtering is proposed. In this model, two retrieval/feedback mechanisms are used respectively. One is based on vector space model and Rocchio feedback algorithm, and another mechanism is derived from a latest language model IR system. Based on them, an incremental learning method using multi-step pseudo feedback is introduced in profile training to keep a minimal bias to the original topic, and an adaptive profile adjusting mechanism in filtering process, which newly takes into account the document distribution and the decay rate of the topic feature, is also developed. The running system constructed using the new model got a high evaluation score in related international contest, indicating that the improvements in the filtering model are effective.
A Simple and Efficient Algorithm to Classify a Large Scale of Texts
Wang Jianhui, Wang Hongwei, Shen Zhan, and Hu Yunfa
2005, 42(1):  85-93. 
Asbtract ( 609 )   HTML ( 5)   PDF (516KB) ( 890 )  
Related Articles | Metrics
Most of classifying methods are based on VSM (vector space model) in the research on classification at present, of which the widely-used method is kNN (k-nearest neighbors). But most of them are highly complicated on computation, and cannot be used on the occasion of classifying a large number of specimen. Moreover, to them, the classifier must be rebuilt when to increment the corpora of the training specimen. So they have tough scalability. Two new concepts, MD (mutual dependence) and ER (equivalent radius), are put forward in this paper. Furthermore, a new classifying method, SECTILE, is offered. SECTILE can be used to classify a large number of specimen and has good scalability. Later, SECTILE is applied to classify Chinese documents and compared to kNN and CCC method. As a result, SECTILE outperforms kNN and CCC method, and can be used online to classify a large number of specimen while the precision and recall of classification are kept.
Using Maximum Entropy Model for Chinese Text Categorization
Li Ronglu, Wang Jianhui, Chen Xiaoyun, Tao Xiaopeng, and Hu Yunfa
2005, 42(1):  94-101. 
Asbtract ( 3618 )   HTML ( 38)   PDF (409KB) ( 4196 )  
Related Articles | Metrics
With the rapid development of World Wide Web, text classification has become the key technology in organizing and processing large amount of document data. Maximum entropy model is a probability estimation technique widely used for a variety of natural language tasks. It offers a clean and accommodable frame to combine diverse pieces of contextual information to estimate the probability of a certain linguistics phenomena. This approach for many tasks of NLP perform near state-of-the-art level, or outperform other competing probability methods when trained and tested under similar conditions. However, relatively little work has been done on applying maximum entropy model to text categorization problems. In addition, no previous work has focused on using maximum entropy model in classifying Chinese documents. Maximum entropy model is used for text categorization. Its categorization performance is compared and analyzed using different approaches for text feature generation, different number of feature and smoothng technique. Moreover, in experiments it is compared to Bayes, KNN and SVM, and it is shown that its performance is higher than Bayes and comparable with KNN and SVM. It is a promising technique for text categorization.
Transport Mechanism Applied to Computer Network Protocol Conformance Testing
Zhang Yujun, Li Zhongcheng, Zheng Hongxia, Tian Ye, and Sun Jingbo
2005, 42(1):  102-108. 
Asbtract ( 559 )   HTML ( 1)   PDF (327KB) ( 489 )  
Related Articles | Metrics
Transport mechanism can influence test system implementation and test suite design. Good transport mechanism can shield the information of implementation under test (IUT) to test suite designer and simplify test configuration. Based on the idea that test suite design and test transport be independent, the technique that combines communication entity with network interface is proposed to implement transport mechanism. The logical test interfaces are defined by network interfaces. The test packets are received/sent by communication entity, which makes the lower service provider transparent to test suite. Manual configuration isn't necessary before testing. By setting appropriate identifier information in the object of network interface, the concurrent test ability of test system is independent of system configuration. It is successful to apply the transport mechanism to IPv6 protocol conformance testing.
A Peer-to-Peer DHT Algorithm Based on Small-World Network
Zhou Jin and Li Yanda
2005, 42(1):  109-117. 
Asbtract ( 542 )   HTML ( 0)   PDF (528KB) ( 584 )  
Related Articles | Metrics
Efficient search technology is a primary key to peer-to-peer systems. In order to address the inefficient routing problem, a DHT algorithm is presented based on the policy of clustering keys of sharing files. Under the key clustering algorithm, the ring-like key space is divided into two layers. Meanwhile, each node obtains a randomly assigned numeric value as its clustering center in key space. The lower layer, which is called AUT layer, is responsible for managing files' keys. The upper layer, HUB layer, is responsible for maintaining the clustering center of nodes. According to the AUT layer and the HUB layer separately, file keys and node clustering centers discovered from bypassing routing messages are clustered towards the clustering center of local node. The algorithm solves the problems of present unstructured algorithms in some degree, in which the datum is organized in a confused situation and routing is proceed in a disordered manner. Further, the outcome of small-world is used to cut down latency of routing hops. To improve the algorithm, a few connections to remote nodes are introduced with a suitable probability into routing table of clustering keys. In another word, the criterion of clustering is not of determination, but of probability. In this way, it is possible to route overlay network with average latency of O(log\+2N) hops.
A Fast and Effective Static Task Scheduling Algorithm in Homogeneous Computing Environments
Li Qinghua, Han Jianjun, and Abbas A. Essa
2005, 42(1):  118-125. 
Asbtract ( 457 )   HTML ( 0)   PDF (421KB) ( 504 )  
Related Articles | Metrics
Efficient scheduling of tasks is a key issue for achieving high performance in multiprocessor systems. At present the most popular model characterizing dependencies between tasks is DAG(directed acyclic graph). In previous reference, a novel model called TTIG (temporal task in interaction graph) that is more realistic and universal than DAG, and its corresponding algorithm called MATE(mapping algorithm based on task dependencies) were presented. This paper extends TTIG model, and proposes a new static scheduling algorithm called GBHA (group-based hybrid algorithm) and two heuristic methods. GBHA tries to collect the tasks that interact with each other into a group so that it can capture global information about almost all dependence relationship between tasks in task graph. Moreover, GBHA ranks the tasks based on groups from bottom to top in graph, and maps the tasks onto processors according to their earliest finish time or earliest start time on each processor. In this work, the algorithms are compared with MATE and some well-known scheduling algorithms based on DAG in homogeneous systems, such as FCP. The results of simulation experiment and analyses of time complexity show that scheduling performance of our algorithms not only outperforms MATE significantly but also can be comparable to efficient scheduling algorithms based on DAG but with much lower time complexity.
The Model and Adaptive Configuration Management Strategy for a Distributed Database Server System
Tian Junfeng, Liu Yuling, and Du Ruizhong
2005, 42(1):  126-133. 
Asbtract ( 456 )   HTML ( 0)   PDF (468KB) ( 535 )  
Related Articles | Metrics
Server redundancy technology solves the problem of availability and the performance bottleneck in the traditional distributed environment, and at the same time presents a new challenge to the system management. A kind of constituting principle and working model of a distributed database server system (DDSS) is presented in this paper. To overcome the weakness of dynamic scalability of redundancy resource in configuration management of the present redundancy server system, aiming at the server DDSS (every service in the system is abstracted to an object class in the model, and multi-instance of the same object class redundantly serving each other is proposed), the configuration management of redundancy resource is discussed, and adaptive configuration management policy based on mobile agent technology is proposed. With the precondition of ensuring the system availability, the system performance is improved and the resource waste is reduced. In ACM, by defining award-penalty function (as for static configuration) and user request arriving rate (as for dynamic configuration) which is the gist of configuration, redundancy instances is added or deleted dynamically. Finally, the algorithmic performance is analyzed and tested, and compared with traditional algorithm.
Genetic Algorithm-Based Least Square Fitting of B-Spline and Bézier Curves
Zhou Minghua, and Wang Guozhao
2005, 42(1):  134-143. 
Asbtract ( 782 )   HTML ( 2)   PDF (590KB) ( 784 )  
Related Articles | Metrics
To obtain a good approximation for data fitting with a spline, parameters and knots have to be treated as variables frequently. There are two kinds of considerations. The first is to choose parameters with which the fitting are better while the knots of the B-spline bases are in a fix. The choices of parameters include uniform parameterization, cumulative chord length parameterization, centripetal model parameterization and Gauss-Newton approach. The other is to determine parameters in advance (generally cumulative chord length parameterization) and then to compute the knots of B-spline bases by some algorithms such that the fitting becomes more precise. In this paper both the parameters and the knots of the B-spline bases are considered simultaneously by using genetic algorithms such that the fitting B-spline curve to data attains its optimum in the total least squares sense. With this, the parameters and the knots can be appropriately determined simultaneously. The method given in this paper have advantages of robustness (the resulting curve is initial-value-free), better precision and fewer vertexes compared with Gauss-Newton approach and Piegl's algorithm. Two examples of data fitting are given to show that the genetic algorithms-based fitting curves are better in approximation. Fitting Bézier curve to a set of data by using genetic algorithms is also studied.
Recognition and Understanding of Architectural Drawings: Model and Algorithm
Lu Tong, Xi Xiaopeng, Rui Ming, Cai Shijie, and Dou Wanchun
2005, 42(1):  144-152. 
Asbtract ( 628 )   HTML ( 1)   PDF (519KB) ( 1292 )  
Related Articles | Metrics
Existing methods of recognition and understanding of architectural drawings suffer from the mitations of sensitivity to noise and bad generality, which makes precise analysis and econstruction much more difficult. Analyzing the semantic content of architecture and simulating the way human read architectural drawings, an efficient self-incremental axis-net-based hierarchical recognition (SINEHIR) model is proposed. SINEHIR recognizes each class of architectural components sequentially, driven by recognized components in the drawing which is incrementally simplified by eliminating the interference of previous recognition. Also presented are feature-based symbol recognition method, symbol-based axes net recognition method, section-tracking-based node recognition method, semantic-relation-based segment recognition method, geometry-based combined component recognition method, and inheritance-based data transition method of SINEHIR.
Design-for-Testability and Test Technologies for System-on-a-Chip
Hu Yu, Han Yinhe, and Li Xiaowei
2005, 42(1):  153-162. 
Asbtract ( 749 )   HTML ( 8)   PDF (488KB) ( 774 )  
Related Articles | Metrics
A comprehensive survey of the up-to-date design-for-testability(DFT) methods and testing technologies for system-on-a-chip(SOC) is presented. The techniques of DFT and testing for embedded cores of digital, analog/mixed-signal, memory and processor are introduced respectively. Among these techniques, some advanced scan and built-in-self-test schemes to provide at-speed test capability or to reduce test application time, test power consumption and test data volume are emphasized. The DFT and testing techniques for SOC at system level are also surveyed. Since test resources are very important to cope with new issues of testing SOC, design, partitioning and optimization of test resources are described in detail. In an SOC, on-chip test resources generally include test access mechanism, test wrapper, and test source and sink. For test source and sink, test resource partitioning approaches based on test stimuli compression/decompression and test response compaction are overviewed. For test access mechanism and test wrapper, test resource optimization techniques of test scheduling based on heurist algorithms are presented. The SOC test standardization by two organizations is introduced. Finally, some future directions in DFT and test technologies for SOC are indicated, and an extensive bibliography is also provided.
Overview of Wireless Sensor Networks
Cui Li, Ju Hailing, Miao Yong, Li Tianpu, Liu Wei, and Zhao Ze
2005, 42(1):  163-174. 
Asbtract ( 1272 )   HTML ( 45)   PDF (543KB) ( 2567 )  
Related Articles | Metrics
More and more academic researchers and people from industry are engaged in developing wireless sensor networks due to the great promise and potential with various applications. This paper describes the basic concept and hardware platform for sensor network. Then the network protocol architecture is proposed with brief introduction of each individual research issue involved. Detailed progress in a few most important research areas is also discussed, including data link layer protocol, network layer routing protocol, power management and network simulation tool.