Journal of Computer Research and Development 2006 Vol. 43 No. 5

A Survey of Statistical Language Modeling for Text Retrieval

2006, 43(5): 769-776.

Abstract(863) PDF(428)(1146)

Abstract：
Relying on the powerful statistical inference theory, statistical language modeling (SLM) has gradually become one of the crucial techniques in lingual information processing. Ponte & Croft first applied SLM to text retrieval in 1998. Since then a large number of studies have concentrated on the language models. In recent years research work carried out by many groups has confirmed that the language modeling approach is a theoretically attractive and potentially very effective probabilistic framework for studying text retrieval problems. In this paper the basic language modeling approaches to text retrieval (SLMTR) are surveyed. The query likelihood model is formally described, and the document language model estimation and data smoothing problems are discussed in detail. Some significant extensions and improvements to the query likelihood model are also presented. Finally, several primary challenges and open issues in SLMTR are summarized.

Model-Based Methods for Software Cost Estimation

He Xiaoyang and Wang Yasha

2006, 43(5): 777-783.

Abstract(907) PDF(356)(674)

Abstract：
Accurate estimation is the foundation of effective project planning, tracking and controlling. Model-based methods, in contrast to expert-experience-based methods, are more independent of individual's capabilities and more reusable in software development organizations, so they are the focus of research in software cost estimation. They may be classified into algorithm-driven models, data-driven models and composite models. In term of the classification schema, the typical methods are described in detail. Then, their assumptions, application environment, advantages and limitations are analyzed deeply which are viewed from both internal attributes and external evaluation. Finally, the future of software cost estimation research is discussed. The purpose of the paper is to compare the advantages and disadvantages of the representative estimation models and provide support for using the suitable models for software development organization.

Research on Topology Discovery in the Overlay Multicast

Cao Jia, Lu Shiwen

2006, 43(5): 784-790.

Abstract(840) PDF(380)(507)

Abstract：
In the overlay multicast, hosts perform the topology discovery and build multicast tree by themselves. This may ruthlessly wastes network resources if the overlay paths mismatch the underlay paths. Otherwise, it will be improved. In this paper, an active topology discovery method based on the random directed graph is analyzed. It is shown that if each node chooses Θ(logn) nodes randomly to test, the topology will be non-strong connective, and if each node chooses 2.997×n\-\{0.5312\} nodes randomly to test, the length from the source host to any hosts will be at most two times the length of directed unicast path. The simulations show that the path will not be improved obviously with the increase of k value, after the k reaches a certain value.

Research on Mobile Agent Path Optimization Algorithm in Grid

Zhang Zhirou, Luo Siwei, Chen Xin, Zhong Jingjing

2006, 43(5): 791-796.

Abstract(743) PDF(312)(537)

Abstract：
It is proposed that mobile agents act as carriers of tasks of applications or services in dynamic heterogeneous grid environments because of autonomy and intelligence of mobile agents. And a new TSP model—dynamic market TSP model, is introduced that is more suited for grid and a path optimization algorithm is proposed to optimize the migration paths of mobile agents in this model. The simulation experiments in different TSP maps have been conducted for this algorithm. The data show that this algorathm makes mobile agents run through less distance for their tasks than the traditional optimization algorithm does, and it converges more quickly.

A Static Scheduling Algorithm Based on Global Task-Transferring

Liang Hongtao, Yuan Youguang, Fang Ming

2006, 43(5): 797-804.

Abstract(797) PDF(571)(633)

Abstract：
The task scheduling in a distributed real time system is known as an NP-complete problem. A typical heuristic algorithm, TDS, can generate optimal schedule in scheduling length under optimality condition. The TDS algorithm can not minimize the demanded processor number because it selects and schedules task nodes in a local area. In this paper, a global task-transferring (GTT) based scheduling algorithm is proposed as an advanced TDS algorithm. Compared with other existing scheduling schemes, including TDS, the GTT algorithm can minimize the processor number, improve speedup and efficiency obviously, and keep the optimality in scheduling length and optimality condition. The GTT algorithm has the time complexity of O(d|V|2) where |V| represents the number of tasks and d represents the maximum degree of tasks.

A Real Time Network Traffic Prediction Algorithm Based on Hybrid Model

Li Jie, Liu Ruixin, Liu Xianxing, Han Zhijie

2006, 43(5): 806-812.

Abstract(1102) PDF(474)(979)

Abstract：
Distributed applications use predictions of network traffic to sustain their performance by adapting their behaviors. It has been recognized that the network traffic consists of a majority of linear part and a small quantity of non-linear part which can not be neglected. However, existent network traffic prediction algorithms only utilize either linear or non-linear methods to solve the problem and can not provide enough accuracy and realtime due to the isolated adoption. A hybrid network traffic prediction algorithm, is provided, in which Kalman filter (KF) and wavelet are combined. Thus the linear part can be processed by KF and the non-linear part can be done by wavelet. Simulation results show that the proposed algorithm can guarantee higher accuracy and better realtime than those algorithms based singly on linear or non-linear method.

A Schedulability Analysis Algorithm for EDF-Based End-to-End Real-Time Systems

Shen Zhuowei and Wang Yun

2006, 43(5): 813-820.

Abstract(869) PDF(394)(700)

Abstract：
An end-to-end real-time scheduling model can be used to describe many distributed real-time systems. In this paper, an EDF-based end-to-end real-time scheduling model is proposed. According to the time demand analysis technique, a schedulability condition for the end-to-end real-time system is deduced. Then, a schedulability analysis algorithm is developed. The schedulability condition and schedulablity analysis algorithm are suitable not only for end-to-end real-time systems controlled by non-greedy synchronization protocols, but also for those controlled by greedy synchronization protocol. Compared with fix-priority-based end-to-end real-time scheduling model and its algorithms, the model and algorithm proposed in this paper are simpler and easier to implement. Simulation results reveal that higher performance can be achieved too.

The Semantic Based Information Service of an Image Processing Grid Platform

He Ruhan, Jin Hai, Liao Zhensong, and Zhang Qin

2006, 43(5): 821-827.

Abstract(651) PDF(407)(491)

Abstract：
For better efficiency and accuracy, it needs semantic match between grid services to implement the service discovery, the service query, and the dynamic allocation and replacement of service. An ontology-based semantic service component of the Image Processing Grid Platform is introduced in this paper. The framework of this component is firstly described and the core semantic match algorithm is given subsequently. The semantic service component implements the semantic query of grid service, the automatic semantic match for workflow construction, as well as the dynamic allocation and replacement of grid service when workflow is running. Finally, the experimental results show that the proposed algorithm has better accuracy than the classical semantic match algorithm for the grid service matching.

Indexing Moving Objects Trajectories on Fixed Networks

Li Guohui and Zhong Xiya

2006, 43(5): 828-833.

Abstract(626) PDF(345)(532)

Abstract：
In a kind of spatiotemporal database applications, objects move on the road networks. To process the position information for such kind of moving objects, people have proposed some index models, but they all have their limitations. These models are unable to index both the present and past positions of moving objects. Meanwhile, they only support window query or trajectory query. A new indexing technique which is called indexing moving objects trajectories on fixed networks (IMTFN) is proposed in this paper. IMTFN consists of a 2-dimensional (2D) R\+*-Tree for managing the fixed networks, a forest of 1-dimensional (1D) R\+*-Trees indexing the time interval for managing the position of moving objects, and a hash structure for the newest location of moving objects. IMTFN supports the efficient query of the present and past positions of moving objects, optimizes operations of windows query and trajectory query. Extensive experiments are conducted to evaluate the performance of the proposed indexing mechanism and show that IMTFN performs considerably better than STR-Tree and FNR-Tree.

An Efficient Discovering and Maintenance Algorithm of Subspace Clustering over High Dimensional Data Streams

Zhou Xiaoyun, Sun Zhihui, Zhang Baili, and Yang Yidong

2006, 43(5): 834-840.

Abstract(681) PDF(418)(814)

Abstract：
Data mining based on data stream has become a very hot research field in recent years. In this paper a novel discovering and maintenance algorithm of subspace clustering over high dimensional data streams is presented, which is based on Hoeffding bound and named SHStream. SHStream partitions data streams (the length of each segment is computed by Hoeffding bound), makes subspace clusters on the segments and discovers clusters step-by-step. Meanwhile, focusing on dynamic of data stream, SHStream adjusts and maintains the cluster results. SHStream can deal with high dimensional clustering problem effectively and discover clusters with arbitrary shape through the technology based on grids and density. The experimental results on real datasets and synthetic datasets demonstrate promising availabilities of the approach.

An Evolutionary Programming to Solve Constrained Optimization Problems

Dong Hongbin, Huang Houkuan, He Jun, Hou Wei

2006, 43(5): 841-850.

Abstract(665) PDF(537)(589)

Abstract：
A mixed strategies evolutionary programming to solve constrained optimization problems is presented in this paper. The approach does not require the use of a penalty function. Instead, it uses a diversity conservation mechanism based on allowing infeasible solutions to remain in the population. A mixed mutation strategy and feasibility-based comparison mechanism is used to guide the process fast toward the feasible region of the search space. This new evolutionary programming has been tested on 13 benchmark functions. The results obtained show that the new approach is a general and effective method.

K-Cluster Subgoal Discovery Algorithm for Option

Wang Bennian, Gao Yang, Chen Zhaoqian, Xie Junyuan, Chen Shifu

2006, 43(5): 851-855.

Abstract(787) PDF(349)(604)

Abstract：
Discovering useful subgoals and creating options while learning is important to improve the learning performance of agent in hierarchical reinforcement learning. A new subgoal discovery algorithm based on k-cluster is proposed which can extract subgoals from the set of trajectories collected online by clustering them. The results of experiment show that the algorithm can find all sobgoals quite efficiently, and the hierarchical reinforcement learning with k-cluster algorithm has better learning performance than Q-learning and hierarchical reinforcement learning with diverse density algorithm.

Study of the Counterpropagation Network Based on the EM Algorithm and Its Application

Hao Yu, Ye Shiwei

2006, 43(5): 856-861.

Abstract(573) PDF(418)(540)

Abstract：
The paper is concerned with the counterpropagation network(CPN) based on the soft-competition mechanism instead of that of the winner-take-all to overcome the later's faults, and the complexity of the network is raised too. By regarding the output of the hidden node in the competition layer as the unknown default random variable, the CPN based on the soft-competition mechanism is trained by the EM algorithm in order to decrease the complexity and accelerate the velocity of convergence of the network. In the M step of the EM algotithm, the algorithm in common use is modified according to the characterstic of the competition layer of the improved CPN, replacing the iteratively reweighted least-squares method with the sample reweighted average method. The results of simulating experiments show that the modified network has a better performance while the training time is greatly reduced. The practicability and convergence of the network is improved rapidly, being particularly useful in the field of classification of patterns.

Research on Solving the Problem of CMAC Neural Network Collision

Su Xiaohong, Zhang Mingjie, Ma Peijun, and Wang Yadong

2006, 43(5): 862-866.

Abstract(650) PDF(323)(428)

Abstract：
Aiming at the problem of physical mapping address collision in the learning algorithm of CMAC(Cerebellar model articulation controller) neural network caused by Hash mapping in the algorithm, a method based on setting a weight overflow area is proposed in this paper. Compared with traditional learning algorithms which solve the collision problem through increasing the size of real space, this method have the advantages of saving physical memory space and improving the precision of network-learning under the conditions of the same size of real space. Finally, the experiment results show that it works well in the applications of nonlinear system identification and color matching.

The Studying of Frame APRF of Pattern-Recognition Based on Agent

Tang Yunting, Cheng Xianyi

2006, 43(5): 867-873.

Abstract(761) PDF(440)(484)

Abstract：
The pattern-recognition course is to the categorised course which inputs the mode, kind is essence of the mode, it is generally acknowledged the vector quantity of the characteristic is a generalization of the mode, because of this understanding, Make the computer discern accurately the mode is difficult for some one kind, the problem lies in on the basis of what characteristic vector quantity summarized the human discernment activity is never, But because of the understanding that input a certain side of the mode, then it is similar to utilize, merge, discern in coordination. The operation principle of this mechanism and Agent is extremely similar, this text has analysed that summarizes the difficulty brought toward pattern-recognition of mode, on the basis of the concept of the parameter of the preface while studying in coordination, provide the frame of pattern-recognition based on Agent APRF(Agent orientation Patten Recognition Frame).

Equation Theory of Type System λω×\-≤ and the Soundness of Its Semantics

Zhou Xiaocong

2006, 43(5): 874-880.

Abstract(685) PDF(434)(604)

Abstract：
Type system plays an important role in the research of the foundations of programming language. In particular, the polymorphic type systems with subtyping capture the key concepts of object-oriented programming language such as subtyping and polymorphism. A polymorphic type system with higher-order subtyping, which is called the type system λω×\-≤, is investigated. By using the inserters and the theory of fibrations, a kind of fibration called the λω×\-≤ fibration is introduced as the semantic model of the type system λω×\-≤. In this paper, the equation theory of the type system λω×\-≤, especially, the equations related to the bounded quantification type, are discussed. Furthermore, by using the properties of the inserters, the semantic soundness of the λω×\-≤ fibration with respect to the equation theory of the type system λω×\-≤ is shown.

A Software Reliability Growth Model Considering Testing Environment and Actual Operation Environment

Zhao Jing, Liu Hongwei, Cui Gang, and Yang Xiaozong

2006, 43(5): 881-887.

Abstract(587) PDF(400)(747)

Abstract：
The testing and operation environment may be essentially different, thus the fault detection rate of testing is different from that of the operation phase. Software reliability growth models (SRGMs) based on the non-homogeneous Poisson process (NHPP) are quite successful tools that have been proposed to assess the reliability of software. The constant environmental factor is proposed by some authors to describe the mismatch between the testing environment and operation environment in SRGMs of NHPP. Actually, the environmental factor ought to be varying with testing time. The varying environmental factor with time can be derived from actual failure data set. The fault detection rate (FDR) of operation is transformed from that of testing phase and varying environmental factors, considering the fault remove efficiency and fault introduction rate, and then an NHPP model PTEO-SRGM is presented. Finally, the unknown parameters are estimated by the least-squares method based on two failure data sets. Experiments show that the goodness-of-fit and predictive power of PTEO-SRGM is better than those of other SRGMs on these two data sets.

Hybrid Indexing of Moving Objects with Frequent Updates

Liao Wei, Xiong Wei, Jing Ning, Chen Hongsheng, and Zhong Zhinong

2006, 43(5): 888-893.

Abstract(716) PDF(380)(584)

Abstract：
TPR-tree is the most popular indexing method for the current and future position of moving objects，but its frequent updates performance is very low. HTPR-tree, which is based on TPR-tree, supplemented by a hash index on leaf nodes and a memory-based direct-access table pointing to internal nodes of TPR-tree is presented for moving objects with frequent updates. Also an extended bottom-up update algorithm is developed for HTPR-tree. Performance analysis and experimental results show that the HTPR-tree update performance outperforms any other indexing method including TPR\+*-tree at the cost of slight query performance degrade.

Constraint-Based Termination Analysis of Active Rules

Xu Guihong, Zhang Jian

2006, 43(5): 894-900.

Abstract(660) PDF(425)(398)

Abstract：
Detecting termination of rules is in general undecidable. Existing static analysis techniques are very conservative, and the current SQL3 standard does not prescribe methods for ensuring termination, so most commercial database products impose a fixed upper limit on the number of cascading rule firings. The commercial solution has undesirable effects e.g. some correct rule firing sequences may prematurely be halted. Since a rule can be viewed as a database state transformer, in this paper, database states are expressed with linear and Boolean hybrid constraints, and processing of rules is simulated based on constraint expressions and constraint solving. This method can obtain more precise conclusion than the existing methods.

An Approach to Termination Decision for a Rule Set Based on Activation Path and Conditional Formula

Xiong Zhongmin, Hao Zhongxiao

2006, 43(5): 901-907.

Abstract(678) PDF(385)(382)

Abstract：
Supporting active rules has become an important characteristic of modern database systems. Active rules termination is an important problem of active database. To prove termination by using triggering and activation graphs has conservativeness. In order to refine the existing works, the formula constructed for an effective activation path is proposed in this paper and a novel approach about how to determine the termination of active rules set is presented. The analytical results prove that the proposed approach can detect more termination situations than any other existing works. Moveover, an algorithm for termination decision is provided, and its correctness and termination are proved.

A Time Serial Model for Divisional Dynamic Terrain

Chen Guojun, Zhao Qinping

2006, 43(5): 908-913.

Abstract(721) PDF(355)(469)

Abstract：
In distributed virtual environment ，terrain can be changed as a result of the interaction between the entity and the terrain. To maintain consistency and reliability of the simulation it is required to serialize and arbitrate terrain changes. A time serial model for divisional dynamic terrain is built according to terrain local modifications. The model is based on a collection of a divisional terrain data defined as simplicial complexes arranged into a partially ordered set by time and space. Features of the model are to provide an insertion of terrain modification and an extraction of terrain at a particular moment. Experimental results on real terrain data show that the algorithm based on one DAG improves the computing performance of an extraction of terrain.

Uniform Interval Implicitization of Rational Surfaces

Li Yajuan and Wang Guozhao

2006, 43(5): 914-919.

Abstract(684) PDF(347)(558)

Abstract：
Parametric curves/surfaces and algebraic curves/surfaces are two common types of representations of geometric objects in computer aided design and geometric modeling. Both of these representations have their own advantages and disadvantages. Thus it is important to have both representations at the same time. In this paper, a new concept called uniform interval implicitization of rational curves is proposed, which is finding an uniform interval curve with lower degree bounding a given rational curve and minimizing some objective function. An algorithm and some examples are provided to demonstrate the theory.

A New Method of Edge Detection Based on Statistical Vector and Neural Network

Zhang Jing, Zhang Quan, and Wang Xin

2006, 43(5): 920-926.

Abstract(767) PDF(511)(575)

Abstract：
A new edge detector based on the statistical vector and neural network is presented in the paper. Firstly, based on the distribution characters of intensities in the neighborhood of an edge pixel, a statistical vector composed of 4 components is proposed. Then through the training with the statistical vector samples calculated from training images, the BP neural network acquires the function of a desired edge detector. Finally, the trained BP neural network is used for edge detection directly. Experiments are carried out with both noisy artificial and natural images. The proposed edge detector proves robust against noise. Besides, both the architecture and training of the BP neural network are simple. Moreover, the proposed edge detector needs no thresholds for conventional edge detection methods.

New Word Detection Based on Large-Scale Corpus

Cui Shiqi, Liu Qun, Meng Yao, Yu Hao, Nishino Fumihito

2006, 43(5): 927-932.

Abstract(939) PDF(331)(1325)

Abstract：
New word detection is a part of unknown word detection. The development of natural languages requires us to detect new words as soon as possible. In this paper, a new approach to detect new words based on large-scale corpus is presented. It first segments the corpus from the Internet with ICTCLAS, and searches for repeated strings, and then designs different filtering mechanisms to separate the true new words from the garbage strings, using rich features of various new word patterns. While getting rid of the garbage strings, three garbage lexicons and a suffix lexicon are used, which are learned by the system, and good results are achieved. Finally, the results of the experiments are discussed, which seem to be promising.

Word Sense Disambiguation Based on Multi-Classifier Decision

Quan Changqin, He Tingting, Ji Donghong, Yu Shaowen

2006, 43(5): 933-939.

Abstract(647) PDF(432)(803)

Abstract：
The problem of word sense disambiguation can be formalized to be a typical classify problem. The committee classifiers are trained by learning a small set of labeled examples, and then these classifiers are updated dynamically by unlabeled examples. The senses of ambiguous words are determined by combining the decision of the final committee classifiers. This approach avoids constructing large-scale sense-tagged corpus, and has higher accurate rate.

Lossless Configuration Bitstream Compression for Virtex FPGAs

Gu Haiyun, Li Li, Xu Juyan, Gao Minglun

2006, 43(5): 940-945.

Abstract(714) PDF(397)(514)

Abstract：
With the drastical improvements of FPGA's density and performance, the size of the configuration bitstreams has also increased considerably. Therefore, configuration time of FPGA is increasingly becoming a concern, and the memory required for storing various FPGA configuration bitstreams will significantly affect the cost of FPGA-based embedded systems. In this paper, an adaptive LZW algorithm is proposed for compressing the Virtex FPGAs' configuration bitstreams based on detailed analysis of their data regularities. The required decompression hardware is simple. Using this technique, it is demonstrated up to 45.63% compression rate for configuration bitstreams of several real-world applications.

A Fast Algorithm for Leakage Power Reduction by Input Vector Control

Chang Xiaotao, Fan Dongrui, Han Yinhe, Zhang Zhimin

2006, 43(5): 946-952.

Abstract(639) PDF(382)(591)

Abstract：
As IC technologies evolve, leakage current is increased exponentially with transistor threshold voltages reduced to maintain adequate performance and noise margins. As a result, leakage power will dominate total power dissipation soon. The stack effect of CMOS transistors makes leakage power different when different input vectors are applied. Input vector control is an effective method when a circuit enters sleep mode. It becomes critical to find a vector that minimizes leakage power to be statically applied to the primary inputs of a circuit. In order to evaluate various algorithms, a gate-level combinational circuit simulator is developed. An algorithm based on probabilities propagation is presented. Experimental results on ISCAS85 benchmark and Godson CPU show that this algorithm can accelerate calculation over triply with acceptable accuracy (error rate about 0.14%).

Formal Analysis of Security Protocol Based on Process Calculus and Knowledge Derivation

Gu Yonggen, Fu Yuxi

2006, 43(5): 953-958.

Abstract(755) PDF(331)(506)

Abstract：
Formal analysis of security protocols is becoming more and more important. It is desiderated to expand the existing methods to study more security properties and to form a unified framework to analyze various security properties. Process calculus is a powerful tool for modeling concurrent systems. The existing process calculi, however, are not very convenient to support data structure. In this paper, a generic model is proposed for the analysis of security protocols based on a process calculus with knowledge derivation. The model facilitates the formal definitions of some well known security properties. Using this model the Needham-Schroeder public-key protocol is analyzed as a case study. Some future directions are pointed out.

2006 Vol. 43 No. 5

Notices More+

Journal Dynamics More+

Academic Activities More+

Links More+

2006 Vol. 43 No. 5

Notices More+

Journal Dynamics More+

Academic Activities More+

Links More+

Export File

Citation

Format

Content