Loading...
ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 December 2005, Volume 42 Issue 12
Paper
Research Advances in Key Technology of P2P-Based Media Streaming
Gong Haigang, Liu Ming, Mao Yingchi, Lu Sanglu, and Xie Li
2005, 42(12):  2033-2040. 
Asbtract ( 503 )   HTML ( 3)   PDF (392KB) ( 858 )  
Related Articles | Metrics
Providing streaming service over the Internet is a challenging problem. The difficulty is twofold. First, it is not a trivial task to stream media on an end-to-end basis because of a video's high bandwidth requirement and long duration. Second, scalability issues arise when attempting to service a large number of clients. Traditional streaming service employs a client-server unicast service model. As the media popularity increases, the server soon becomes the bottleneck. The problem of how to best provide media streaming service to a large number of clients in a scalable manner remains to be solved. The recently emerging P2P technologies have huge potential on resource usage and system scalability, and P2P-based media streaming is attracting a great deal of attention. Research advances in key technology of P2P-based media streaming are discussed and the state of art of P2P media streaming is presented. Finally, some issues to be further studied are discussed.
An Overview of Reflective Middleware
Du Zhao, Wang Xiaoge, and Chen Yu
2005, 42(12):  2041-2047. 
Asbtract ( 504 )   HTML ( 0)   PDF (314KB) ( 908 )  
Related Articles | Metrics
As the increase of the amount and diversity of distributed applications, middlewares are expected to be flexible, adaptive and context-aware. Reflection allows a program to access, reason about and alter its own interpretation, thus making it possible to build middlewares which can meet the requirements above. First, detailed information about computational reflection is presented: its origin, definition, related concepts and applications. Then reflective middleware and component technology are explained. In the following section, several selected research projects are introduced in order to show different ways people have used to construct reflective middlewares. Finally, some suggestions for future work are made.
A Dynamic Forwarding Pointer Mobility Management Strategy with a Fuzzy Logic Controller
Zhu Yihua and Yu Li
2005, 42(12):  2048-2055. 
Asbtract ( 358 )   HTML ( 0)   PDF (454KB) ( 583 )  
Related Articles | Metrics
Mobility management is a challenge in mobile computing. A dynamic mobility management strategy with a fuzzy logic controller is proposed, in which the number of the boundary crossing of a mobile host, the chain length of forwarding pointers, and the cost of location management are chosen as the input of the fuzzy logic controller, and besides, the variation of the chain length, which is used to adjust the chain length of forwarding pointers, is chosen as its output. Simulation results demonstrate that the proposed strategy outperforms the basic mobility management strategy of the existing mobile communication networks in cases when call-to-mobility ratio is low.
Research on the Node Spatial Probabilistic Distribution of the Random Waypoint Mobility Model for Ad Hoc Network
Shi Rui and Yang Xiaozong
2005, 42(12):  2056-2062. 
Asbtract ( 610 )   HTML ( 2)   PDF (584KB) ( 775 )  
Related Articles | Metrics
The Random Waypoint model is a commonly used mobility model in the field of ad hoc network. The spatial probabilistic distribution of this mobility model is deeply studied, and the exact spatial distribution function of the generalized model with pause time and velocity randomly chosen has been obtained both in 1D and 2D region. Simulation results have also validated the distribution functions. Through the exact function it could be seen how pause time and velocity affect the node spatial distribution. The research results provide theoretical foundations for derivations and proofs of ad hoc theories based on the Random Waypoint mobility model. Also the exact distribution is of practical significance in conducting the simulations of performance evaluation based on the Random Waypoint model.
ALBC4WS:A Dynamic Serivce Composition Framework Based on the Software Architecture Lifecycle
Rao Yuan, Feng Boqing, and Li Zunchao
2005, 42(12):  2063-2069. 
Asbtract ( 574 )   HTML ( 0)   PDF (460KB) ( 608 )  
Related Articles | Metrics
As a newly distributed computing paradigm under the loose coupling Internet environment, Web service provides a new technology for B2B dynamic e-business across different organizations. Some conceptions about Web service composition are defined and some properties are analyzed from the architecture perspective. A Web service composition model based on software architecture life cycle, i.e. ALBC4WS model, is proposed. which integrates the merits of existing composition strategies and is built on the basis of service publish management algorithm and service automatic inquiry-and-resume management for composition algorithm. Therefore, this model provides a dynamic and manageable underlying framework for service composition. The application results in the process of the development of OPEN-WEB prototype system show that the ALBC4WS model effectively increases the composition capability of service component and provides the robustness and self-adaptability of service composition-based application, while it also provides a dynamic management mechanism for service automatic composition.
An Enhanced TCP Congestion Control Algorithm
He Yanxiang, Xiong Naixue, and Yang Yan
2005, 42(12):  2070-2076. 
Asbtract ( 681 )   HTML ( 4)   PDF (474KB) ( 517 )  
Related Articles | Metrics
At present, TCP congestion control algorithm is widely applied to the Internet as a reliable data transmission protocol. When the credibility of data transmission is guaranteed, fairness among different TCP traffic flows is an important performance index of the network. Nowadays, the fairness algorithm among TCP flows has been proposed in a single bottleneck network, but few people have focused on the fairness among TCP flows in a multiple bottleneck link so far. Therefore, in this paper an enhanced TCP congestion control algorithm in multiple bottleneck networks is proposed and is realized in IP network in support of ECN mechanism in network layer. The simulation results demonstrate that remarkable fairness among TCP data flows in the multiple bottleneck networks can be achieved with the application of this algorithm. Furthermore, the supposed algorithm has higher throughput and faster response than the current TCP algorithm used. Therefore, the performance of this algorithm is better in multiple bottleneck networks than that of the current TCP algorithm used.
A New Technique of Demultiplexing Distributed Packet LoadBalancing for Parallel Packet Switch
Falah Mousa Falah ALALI (Jordan)
2005, 42(12):  2077-2083. 
Asbtract ( 595 )   HTML ( 0)   PDF (404KB) ( 415 )  
Related Articles | Metrics
Packet load-balancing is a key technique for parallel packet switch (PPS). It is already known that there have been practical load-balancing techniques with IP routing table lookup under the line rate of OC768 (40GBps) or OC3072 (160Gps). However, according to the technology of today, it is difficult to implement packet processing under such high line rate. Based upon demultiplexing packet processing and packet load-balancing, a demultiplexing distributed architecture for packet load-balancing and its algorithm DDPA (demultiplexing distributed parallel packet switch algorithm) are presented. Moreover, a modified DDPA based on packet-discarding is proposed, which is more practical. The validity of the technique is proved and the key parameters are deterministically guaranteed.
An IP Traceback Scheme with Packet Marking in Blocks
Qu Haipeng, Li Dequan, Su Purui, and Feng Dengguo
2005, 42(12):  2084-2092. 
Asbtract ( 453 )   HTML ( 0)   PDF (578KB) ( 726 )  
Related Articles | Metrics
DDoS attack is a big problem to the Internet community due to its high-profile, severe damage, and the difficulty in defending against it. Several countermeasures are proposed for it in the literature, among which, probabilistic packet marking (PPM) first developed by Savage et al is promising. However, the PPM marking schemes have the limitations in two main aspects: high computation overhead and large number of false positives. An IP traceback scheme with packet marking in blocks is proposed, which is more practical because of higher precision, and computationally more efficient compared with the PPM scheme proposed by Savage.
An Access Control Model for DBMS Based on Dynamic Context Stack
Xu Zhen, Li Lan, and Feng Dengguo
2005, 42(12):  2093-2099. 
Asbtract ( 399 )   HTML ( 0)   PDF (419KB) ( 413 )  
Related Articles | Metrics
Stored procedures are dynamic entities in DBMSs. Determination of the privileges set of their execution is a key problem of the effective access control of DBMSs. Approaches usually adopted violate the principle of “least privilegs”, which leads to a series of security vulnerabilities of DBMSs. In addition, the cascading execution of stored procedures brings about difficulties of limiting the scope of the application of the privileges set. According to these difficulties, a DBMS access control model based on dynamic context stack is presented, which takes operation sequences as its input and determines the privileges set of the execution of stored procedure based on the context stack. It not only supports the principle of “least privilegs” well, but has good property of manageability and scalability.
Abstract Security Properties in Process Algebra
Zhou Wei, Yin Qing, and Wang Qingxian
2005, 42(12):  2100-2105. 
Asbtract ( 579 )   HTML ( 0)   PDF (331KB) ( 547 )  
Related Articles | Metrics
A theory of abstract security properties is presented in process algebra with abstract security properties defined as special kinds of equivalent classes of processes, that is, the sets of processes that are equally secure. In the context of process algebra, investigation of abstract security properties can be cast into the investigation of the properties of process algebra operators. It proves that partial orders can be defined on the sets of abstract security properties to generate CPOs. Thus the theory of CPOs and fixed-points can be used. An investigation is made into the actions of algebra operators on the sets of abstract security properties. A theorem is given to show that process algebra operators are monotony functions. From the above theorem, (1) compositional invariant security properties and constructive security properties are proved to exist, and (2) security properties are degraded under operators of process algebra, which is known as “bucket principle”, i.e, a composed system cannot be securer than the weakest link of the system. Finally, a formal definition of absolute security property is given, and is associated with trivial properties of processes. A theorem shows that absolute security property is itself trivial property.
A Robust Itinerary Protection Based on Mobile Agents
Liu Yi and Wang Yumin
2005, 42(12):  2106-2110. 
Asbtract ( 365 )   HTML ( 0)   PDF (272KB) ( 464 )  
Related Articles | Metrics
A mobile agent is regarded as a piece of software roaming the network on behalf of user. Over the recent years, mobile agent technologies have been receiving a great deal of interest. It seems to be the solution to many of the problems in the area of distributed systems. But there are some security issues that have been regarded as the obstacle of application of mobile agent. Route protection is one of them. It is important to protect the routes of a mobile agent if it will visit a list hosts or if it will dispatch some mobile agents to other hosts. In this paper, a secure itinerary protection based on atomic encryptions and atomic signatures is presented which satisfies a set of general security properties for agent route protection. Compared with that based on nested encryptions and nested signature, the protocol reduces the computational cost of route protection. After that, a secure and robust route for mobile agents is given. It can make mobile agents bypass unreachable hosts and continue to finish the task of owner while secure environments existing in hosts to generate mobile agents are not required.
A Multi-Scale Images Edge Detection Model Based on Gap Statistic of Order Wilcoxon Rank Sum
Huang Chenrong, Zhang Zhengjun, and Wu Huizhong
2005, 42(12):  2111-2117. 
Asbtract ( 513 )   HTML ( 1)   PDF (595KB) ( 530 )  
Related Articles | Metrics
On the basis of the order Wilcoxon rank sum statistics in relative half-neighborhood of image pixels, concepts called order Wilcoxon rank sum statistics, statistic gap of order Wilcoxon rank sum, and membership degree of an image edge are proposed. Based on the gap statistic of order Wilcoxon rank sum, a multi-scale images edge detection model is established. Because the proposed model has the property of multi-scale, the excellent anti-noise in the interior of the region, and the noise weakening faintish edge, the corresponding algorithm has good performance in image edge detection. The difference of edge detection between the model and Prewitt operator is analyzed. Lots of experimental examples verify the capacity of the model. Finally, an investigation is taken to compare the difference of edge detection under the circumstance of different scales.
Texture Synthesis by the Border Image
Yang Gang, Wang Wencheng, and Wu Enhua,
2005, 42(12):  2118-2125. 
Asbtract ( 604 )   HTML ( 2)   PDF (601KB) ( 532 )  
Related Articles | Metrics
It is often difficult for existing texture synthesis methods to preserve the border structures of the sample texture. Because of this, a new method is presented that makes the borders to take effect in producing new textures by a border image so that the border structures can be well preserved, which is very suitable for a type of texture called “the texel-style texture” that is composed of some independent color pattern units. In this method, a border image is extracted first from the sample texture. Then, guided by the border image, respective synthesis methods are used for the two kinds of the texel-style textures, the overlaid texture and the un-overlaid texture. They are the “patch overlaying method” for the overlaid texture and the “patch matching method” for the un-overlaid texture. Compared with the existing methods, the method presented can preserve the border structures more efficiently to produce satisfactory results for the texel-style textures and run fast.
Image Fusion at Similar Scale
Chen Tao, Yi Mo, Liu Zhongxuan, and Peng Silong
2005, 42(12):  2126-2130. 
Asbtract ( 713 )   HTML ( 0)   PDF (434KB) ( 632 )  
Related Articles | Metrics
An novel idea that image fusion should be performed at similar scale is proposed. Based on the idea “similar scale”, an image fusion algorithm is presented. At first, the “à trous” discrete wavelet transform is used to decompose the panchromatic image so that its approximation and the interpolated multispectral image have the similar scale. Secondly, image fusion based on weighted multiscale fundamental form (WMFF) is performed on them to obtain the synthesized approximation. Finally, the inverse “à trous” wavelet transform of the synthesized approximation and the detail of the panchromatic image reconstructs a high spatial resolution multispectral image. Compared with other wavelet-based techniques in the literature, the proposed image fusion algorithm has better performance.
A New Soft-Tissue Cutting Algorithm Based on Element Subdivision
Xiong Yueshan, Luo Jun, Tan Ke, Wang Yanzhen, and Guo Guangyou
2005, 42(12):  2132-2136. 
Asbtract ( 638 )   HTML ( 0)   PDF (346KB) ( 849 )  
Related Articles | Metrics
Cutting is one of simulations to the real actions in virtual surgery system, reality and real-time performances are the key factors for modeling cutting. Therefore, it is necessary to study new soft-tissue cutting algorithm,which belongs to element subdivision algorithm. The algorithm is different from other element subdivision algorithms. It divides the whole subdivision process into two smaller processes: firstly the incomplete-cutting element is decomposed into complete-cutting element, then the basis element decomposition is performed according to different cutting cases. Experiment results show that using this new algorithm for tetrahedron, finite element model can provide well visual result.
A Fast Image Encryption Algorithm Based on Chaos System and Henon Map
Zhang Han, Wang Xiufeng, Li Zhaohui, and Liu Dahai
2005, 42(12):  2137-2142. 
Asbtract ( 681 )   HTML ( 3)   PDF (359KB) ( 831 )  
Related Articles | Metrics
On the basis of chaos system and Henon map, a new fast image encryption algorithm is presented. Two-dimensional reversible nonlinear Henon map that is dealt with mould operation is utilized to circularly iterate gray value of pixels by chain type. In each step, iterated times and parameters of Henon map are sequentially set as values of a chaotic sequence which is generated by one-dimensional chaotic map. In this algorithm the computational speed is fast due to simple design and it is accurate to decipher cipher text. Especially,it solves difficult problem of failure in deciphering because of limited precision and different precision in different computers generally existing in algorithms based on chaos or nonlinear map. The algorithm avoids inherent defects and inadequateness of cryptological intensity of other algorithms adopting permutation transform and thus has super security.
A Model Based on Local Displacement Fitting with BPNN for Calculating Myocardial Deformation
Zhu Jin, Xia Deshen, and Heng PhengAnn
2005, 42(12):  2143-2148. 
Asbtract ( 542 )   HTML ( 1)   PDF (434KB) ( 708 )  
Related Articles | Metrics
Tagged MRI affords a non-invasive method for tracing the motion in myocardium. Some displacement information from the spare distorted tagging curves sampled in certain given times of the heartbeat period, can be obtained by the segmented result in tagged MRI. It is a challenge to recover nearly all the LV motions with the information. In this paper a model for calculating cardiac deformation is innovated. At first, the intersection points of tagging lines, which are obtained from the segmented result in tagged MRI, are used for fitting the local myocardial displacement with BPNN. Then the material points motion in heart are calculated by iterative computation of the displacement functions and the myocardial deformation is calculated. Experiments show that this model has advantages of explicit physical meaning, simple algorithm and high precision in calculation in the model.
An Adaptive Jitter Buffering Algorithm for Voice over IP Networks
Gou Xiantai, Jin Weidong, and Jin Fan
2005, 42(12):  2149-2154. 
Asbtract ( 556 )   HTML ( 0)   PDF (381KB) ( 879 )  
Related Articles | Metrics
The continuous playout of voice packets in the presence of variable network delays is often achieved by buffering the received voice packets for sufficient time. Basic jitter buffering algorithms can work well only when the delay does not spike in the IP networks. In this work, an adaptive jitter buffering algorithm based on the detecting and studying the spike status of the networks, is presented to promote the quality of voice communication. It timely adjusts the minimal and maximal depth of buffer queue according to the control target of end-to-end delay and packet loss rate. The algorithm can much more easily achieve the continuous playout because it plays voice packet at a fixed inter-play time in the most time of a talk-spurt. The control target of packet loss rate can be extended to 20%. However, the basic algorithms can only bear 5%~10% of the packet loss rate. Perceptual evaluation of speech quality (PESQ) is applied to assess the speech quality in the simulation. It is shown that the algorithm can obviously promote the quality of voice communication in IP networks with spike delay. The practical application in voice gateway can also prove the effects of voice quality promotion.
Text Representation Using Domain Dictionary
Chen Wenliang, Zhu Jingbo, Zhu Muhua, and Yao Tianshun
2005, 42(12):  2155-2160. 
Asbtract ( 775 )   HTML ( 0)   PDF (348KB) ( 1109 )  
Related Articles | Metrics
In this paper an approach to improving the performance of text categorization is presented by using machine learning technique and domain-dictionary. Domain-dictionary based text representation can enhance the ability of text feature expression and reduce the feature dimensionality. But the size of domain dictionary is limited; some words are not included in domain dictionary, so a machine learning technique named self-partition model is proposed to resolve it. The proposed model can automatically map the words to domain features. Then a text categorization system is developed that uses these learned domain features as text features. The experimental results show that the proposed approach can improve the performance of text categorization. And it can provide high accuracy when the size of feature set is small. When the number of features is 500, it yields 6.58%F1 over the system based on BOW.
Study of Determining a Conic with Five Constrained Points and Its Application in Parametric Interpolation
Liu Yi and Zhang Caiming
2005, 42(12):  2161-2168. 
Asbtract ( 396 )   HTML ( 0)   PDF (439KB) ( 495 )  
Related Articles | Metrics
Discussed in this paper the problem of determining a parabola with four orderly points and the effect of the geometric distributing of constrained orderly five points in respect to the shape of conics. A computing method of determining a conic uniquely with five distinct planar points is introduced. The computing translation formulas between implicit conics and rational quadratic Bézier curves are proposed too, and then it is expedient to figure out the parameters of the interpolation knots. Based on this study, a new computational technique of rational re-parameterizations of the constructed knots' parameters is presented. Experiment results show that the error is amended considerably and the interpolating precision is higher with respect to the constructed quadratic curve. Experiments for comparing the efficiency of the two new methods with that of other methods are also included.
An RT-Level Vector Generation Method for Observability-Based Statement Coverage
Lu Wei, Lü Tao, Yang Xiutao, and Li Xiaowei
2005, 42(12):  2169-2175. 
Asbtract ( 387 )   HTML ( 0)   PDF (407KB) ( 645 )  
Related Articles | Metrics
Traditional statement coverage metric based on the activation of statements, without taking observability into account, can result in an artificially high reading of coverage and a false sense of confidence. So the observability-based statement coverage metric is proposed. This metric computes observability information to determine whether the effects of errors activated by the program stimuli can be observed. With the density and complexity of circuits extended, this metric plays a more and more important role during verification. Introduced in this paper is a method of vector generation for the observability-based statement coverage metric. The contribution of the work includes two aspects. Firstly, precise and concise abstract representations are presented from HDL descriptions to model observability information. Secondly, a novel simulation-based algorithm is presented to generate vectors for the observability-based statement coverage. During this procedure, the proposed algorithm always tries to cover all unobserved statements, and reduce unnecessary backtracking, so it is efficient. Finally, the method proposed has been implemented as a prototype tool for VHDL designs, and the results on benchmarks show the significant benefits.
Hardware/Software Partitioning Based on Ant Optimization with Initial Pheromone
Xiong Zhihui, Li Sikun, and Chen Jihua
2005, 42(12):  2176-2183. 
Asbtract ( 424 )   HTML ( 1)   PDF (493KB) ( 616 )  
Related Articles | Metrics
Ant optimization algorithm gets to optimal results efficiently, but it lacks initial pheromone at the beginning, which limits its further improvement. A hardware/software partitioning algorithm is presented for platform-based system-on-a-chip design. This algorithm is based on ant optimization with initial pheromone called AOwIP, and the basic ideas are: a) Use the partitioning result of reference design provided by platform-based design method as initial partition of current design, and convert the initial partition into initial pheromone for ant algorithm, and b) Based on the initial pheromone generated, the AOwIP makes use of such advantages as positive feedback and efficient convergence of the ant algorithm to search for the optimal partitioning scheme. The AOwIP adopts the system level reusing feature of platform-based design to avoid the ant algorithm's shortcoming of lacking initial pheromone. Experiments show that the AOwIP improves the performance of ant algorithm for hardware/software partitioning problems.
ReDE: A Regular Expression-Based Method for Extracting Biological Data
Deng Xubin and Zhu Yangyong,
2005, 42(12):  2184-2191. 
Asbtract ( 520 )   HTML ( 1)   PDF (557KB) ( 591 )  
Related Articles | Metrics
Extracting data from heterogeneous biological data sources to build a query and analysis platform for biological scientists is currently a hot research topic. In general, data extraction process concerns many interdependent metadata. Making full use of dependencies among metadata to generate one metadata from another can reduce metadata maintenance overhead. However, many data extraction methods overlook these dependencies and require much effort to construct and maintain many metadata. In this paper, a regular expression (RE) based method named as ReDE is proposed to avoid this drawback: by building a parse tree for RE groups, an RE-based algorithm for generating relational database scheme and a general data extraction and assembling algorithm are designed. The novelty is that the RE is the only necessary metadata whose management and maintenance are relatively easy. This method can serve as the basis for building a biological database design-aiding tool and a high automatic tool for data extraction, and has been applied to extract data for the first online integrated biological data warehouse of China.
Mining Frequent Patterns in Data Streams
Liu Xuejun, Xu Hongbing, Dong Yisheng, Wang Yongli, and Qian Jiangbo
2005, 42(12):  2192-2198. 
Asbtract ( 574 )   HTML ( 2)   PDF (417KB) ( 847 )  
Related Articles | Metrics
Finding frequent items is one of the most basic problems in the data streams. The limitless and mobility of data streams make the traditional frequent-pattern algorithm difficult to extend to data streams. According to data streams characteristic, inspired by the fact that the FP-growth provides an effective algorithm for frequent pattern mining, a new FP-DS algorithm for mining frequent patterns from data streams is proposed. In addition, the method, in which data streams are partitioned and frequent items are mined step by step, is adopted in the algorithm. So users may continuously get present frequent items online and any length frequent patterns for data streams can effectively be mined. Through introducing error ε, a large number of non- frequent items will be cut down and the storage space of the data streams can be reduced. Based on this algorithm, the error of support is guaranteed not to exceed ε. The analysis and experiments show that this algorithm has good performance.
Termination Analysis of Active Rule Based on Dependency Set
Hao Zhongxiao,, Ren Chao, and Zhao Lingqiang
2005, 42(12):  2199-2205. 
Asbtract ( 434 )   HTML ( 0)   PDF (382KB) ( 427 )  
Related Articles | Metrics
Termination of active rule set is one of the three important characteristics of active database rule sets. It will directly influence the application of active database systems. By analysis of dependency relationship, which exists among the active rules, such concepts are given as triggering-transition closure, dependency-transition closure etc. Based on these concepts, an approach is supposed to analyze termination of active rule set by triggering-dependency graph (T-DG) of active rules. Especially, the approach deciding termination of the active rule set corresponding to a triggering-graph (TG) is studied, which includes a cycle. Also given are the corresponding deciding algorithms, verification and analysis of the algorithms.
An Approach for Detecting Similar Duplicate Records of Massive Data
Han Jingyu, Xu Lizhen, and Dong Yisheng
2005, 42(12):  2206-2212. 
Asbtract ( 489 )   HTML ( 2)   PDF (418KB) ( 964 )  
Related Articles | Metrics
Detecting similar duplicate records of massive data is of great importance in data cleaning. An efficient method for detecting similar duplicate records of massive data is presented: hierarchy spaces are put forward, which are constituted by a sequence of component space called q-gram space. First every record to be cleaned is mapped as a point in the corresponding component space. Then, taking advantage of the inherent hierarchy of the hierarchy spaces, all the similar duplicate records will be detected by hierarchical clustering. On one hand, this method overcomes the shortcoming that the similar duplicate records may fall far from each other during sorting phase by the traditional ‘sort & merge’ method. Thus they can't be found in the succeeding merge phase; On the other hand, it can greatly reduce the expensive disk I/O cost by avoiding external sorting. Both theory and experiment show that it is an effective approach to detect the similar duplicate records for massive data.
VA-Trie: A New and Efficient High Dimensional Index Structure for Approximate k Nearest Neighbor Query
Dong Daoguo, Liu Zhenzhong, and Xue Xiangyang
2005, 42(12):  2213-2218. 
Asbtract ( 607 )   HTML ( 1)   PDF (357KB) ( 693 )  
Related Articles | Metrics
Since 1990's, great progress has been made in the area of content-based multimedia information retrieval. A very challenging problem emerged at the same time: how to organize high dimensional vectors so that efficient similarity query could be realized. Many index structures have been proposed to solve this problem, such as R-Tree and its variants, VA-File, A-Tree etc. From the published results, it can be concluded that most of these methods could achieve good query performance when the dimensionality is less than 20. However, the performance suffers greatly as the dimensionality increases. To obtain efficient similarity query in higher dimensional spaces, a new index structure called VA-Trie is introduced. The key idea behind VA-Trie is adopting the idea of quantization to compress the vectors and then employing the Trie structure to organize and manage the approximations. The experimental results show that VA-Trie outperforms A-Tree and sequential scan in high dimensional spaces.