ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 May 2010, Volume 47 Issue 5
Research Advances and Challenges of Body Sensor Network (BSN)
Gong Jibing, Wang Rui, and Cui Li
2010, 47(5):  737-753. 
Asbtract ( 1494 )   HTML ( 3)   PDF (3056KB) ( 800 )  
Related Articles | Metrics
As one branch of WSN and important part of the Internet of Things, Body Sensor Network(BSN) is utilized to improve people's Healthcare and Medicine by pervasive computing, intelligent information processing, new network technologies and services. So, it is paid more and more attention to by researchers and enterprises. Existing survey researches have given full discussion on the advances and challenges of BSN's biosensor, wireless communication architecture and data security. However, focused on in this paper are BSN's data fusion, context-aware sensing, and system technologies from perspectives of technology challenges, research advance and development trend. It introduces basic concepts and research situation of BSN are introduced. Besides, BSN's system architecture, representative applications and research projects are introduced. At the same time, future research and application prospects of BSN are given. Some practical problems about restriction on BSN research and development are also presented. The contributions are to propose the BSN architecture, analyze the challenges of BSN, and point out the practical problems restricting BSN's development. Through studying abundant references and designing effective introduction structure, the hot and key scientific issues about BSN are given clearly and thus helps other researchers, especially beginners, to know BSN's situation well. At last, although lots of challenges still exist, the developing trend of BSN will be that multi-techniques are integrated to achieve intelligence, effectiveness and usability of it. And, BSN will become an inevitable choice for future healthcare monitoring service.
Multi-Query Processing Technology of Approximate Continuous Queries in Wireless Sensor Networks
He Wenlin and Chen Hong
2010, 47(5):  754-761. 
Asbtract ( 391 )   HTML ( 1)   PDF (1271KB) ( 382 )  
Related Articles | Metrics
Wireless sensor networks opens up a fresh research area of database, where efficient use of sensors' limited energy is the primary goal. In general, queries issued to wireless sensor networks are approximate (precise results are not required due to networks' constraint and saving limited energy on sensors), continuous (used to monitor the trend of physical world) and running in parallel. Most studies focus on one aspect of these three features to achieve energy-efficient process in wireless sensor networks. However, if multiple approximate continuous range queries (ACQ) are injected into the networks and run independently without optimization, they will cause a serious problem—sensors are likely to send reduplicate data for different queries and therefore shorten the networks' lifetime. In this paper, the technology of energy-efficiently processing multiple ACQ is studied, and a data structure called rq-kd-tree to index multi-dimensional range queries is designed, which is used to get the common required data of different queries (query intersecting areas). Based on a concept called query similarity degree, original queries are rewritten and issued to the networks. Finally, detailed experiments on both real data and synthetic data are conducted to prove the proposed methods achieving energy-efficient in processing multiple queries.
Distributed Aggregations for Two Queries over Uncertain Data
Zhou Xun, Li Jianzhong, and Shi Shengfei
2010, 47(5):  762-771. 
Asbtract ( 558 )   HTML ( 0)   PDF (1088KB) ( 586 )  
Related Articles | Metrics
The technique of querying uncertain data is playing an increasingly important role in the fields of military affairs, finance, and telecom. Actually, a lot of uncertain data is generated in distributed systems such as wireless sensor networks, distributed Web servers and P2P systems. Collecting all the uncertain data from such a system to perform centralized queries will lead to immense communication cost, time delay and storage cost. Also, most centralized query methods are not applicable to distributed systems due to the features of uncertain data. In this paper, distributed uncertain Max (Min) and Top-k queries are defined, and designed distributed aggregation algorithms are designed based on several filtering strategies. The algorithms compute the upper bound of the candidate probability for each data entry using the uncertain ranges and probability distributions of the data, and filter the data entries as much as possible, which have no chance to influence the query result. Also, the algorithms merge and keep a relatively small amount of necessary data in the transmission, to compute the final results. The correctness of the filtering strategies has been proved. Simulations show that the algorithms can process the queries correctly and reduce communication cost efficiently with various data and system circumstances.
Modeling Cascading Failures for Internet Based on Congestion Effects
Wang Jian, Liu Yanheng, Mei Fang, and Zhang Cheng
2010, 47(5):  772-779. 
Asbtract ( 411 )   HTML ( 0)   PDF (1879KB) ( 585 )  
Related Articles | Metrics
Internet is a complex network with the characteristics of self-organized criticality. If the nodes with many more connections are attacked, it may lead to dropped efficiency and even abnormal due to overload. The successive traffic on the overloaded nodes is compelled to reroute to avoid the congested nodes. The bypassing may congest other downstream nodes and lead to more traffic detour and node congestion, and then a cascading failure may happen. The cascading dynamics of the Internet are analyzed and two reasons are pointed out which may cause cascading failures. Different from betweenness centrality, the congestion function to represent the congested extent of node is proposed. By introducing the concept of “delay time”, the correlation between permanent removal and non-removal is built, and the flexibility of model is improved. And a new assessing function of network efficiency based on congestion effects is given in order to measure the destruction of cascading failures, which highlights a more meaningful way to measure the damage of cascading failures. Moreover, some impacts of structure and size of topology, delay time, handling capability and generating speed of packets on congestion propagation are also investigated, and congestion propagation process consisting of three phases and some factors affecting transition phenomenon are uncovered.
A Web Services Matching Mechanism Based on Semantics and QoS-Aware Aspect
Zhang Peiyun, Huang Bo, and Sun Yamin
2010, 47(5):  780-787. 
Asbtract ( 443 )   HTML ( 0)   PDF (904KB) ( 466 )  
Related Articles | Metrics
Web service matching is an important step in dynamic services composition. However, most current researches only make the local semantic matching for individual abstract Web service during composition process, not taking the global matching into consideration. Moreover, even though the global matching is considered, there is only QoS-aware matching, not global semantic matching. To achieve all these goals, aimed at abstract Web services composition process, the first is to locate appropriate concrete Web services and obtain them. Based on the declarative semantic description of Web services, two steps of matching are proposed. One is a local semantic matching based on the abstract services and the other is a global semantic matching based on QoS-awareness. The semantic matching algorithm is designed for local semantic matching and the usability of service is extended in the algorithm for input/output matching. A QoS model is built and a corresponding evaluation method is given for matching of service composition process. Based on the QoS model and the evaluation method, a genetic algorithm is proposed to achieve the maximal global semantic matching degree and to satisfy QoS requirement for the whole abstract service composition process. Experimental results and analysis show that Web service matching mechanism based on semantics and QoS-aware aspect is feasible and effective.
Discovering Redundancy-Aware Top-k Anomalies in High Dimensional Data
Chen Guanhua, Ma Xiuli, Yang Dongqing, , Tang Shiwei, Shuai Meng, and Xie Kunqing,
2010, 47(5):  788-795. 
Asbtract ( 507 )   HTML ( 2)   PDF (821KB) ( 450 )  
Related Articles | Metrics
Discovering anomalies is an important data mining task which has been studied in many applications. In this paper, by emphasizing the problems of exception measurement of high dimensional objects and redundancy in the set of anomalies, an approach is proposed to discover the anomalies in high dimensional data. With a bipartite graph representation of the given high dimensional dataset, the capability of compression of each object is used to measure the degree of exception of the object. Based on the exception measure, the dataset containing different types of attributes, such as binary attributes, categorical attributes and numeric attributes, are well supported. To solve the problem of redundancy in the set of top-k anomalies, the concept of redundancy-aware top-k anomalies is proposed. For the problem of mining the exact set of the redundancy-aware top-k anomalies is NP-hard, an algorithm based on greedy heuristics, named k-AnomaliesHD, is designed to discover an approximate set of the redundancy-aware top-k anomalies efficiently. The experimental study both on real and synthetic datasets shows that the algorithm scales linearly with the dimensionality of the dataset and quadratic to the size of the dataset. Further, compared with the redundancy-unaware method, the set of redundancy-aware top-k anomalies is much more effective to cover the abnormal patterns of data.
InfoSigs: A Fine-Grained Clustering Algorithm for Web Objects
Sheng Zhenhua, Wu Yu, Jiang Jinhua, Shou Lidan, and Chen Gang
2010, 47(5):  796-803. 
Asbtract ( 591 )   HTML ( 0)   PDF (1297KB) ( 452 )  
Related Articles | Metrics
Clustering of objects in Web (IR) documents has recently become a hot topic in the research community of Web information retrieval (IR). Generally, quality Web IR requires fine-grained clustering of objects in documents. However, the present clustering algorithms are mostly confined to the level of sentence structure or textual topic. The lack of consideration of token information for identifying more detailed-level objects often leads to coarse-grained clustering results. To address this problem, the authors propose a novel fine-grained clustering algorithm named InfoSigs, which captures the token information signatures inside Web documents. The work contains two contributions: Firstly, techniques are presented to construct a directed acyclic graph of information-transmission from token frequency sequences implying probabilistic hierarchy property between tokens. Each token feature is given a weight value based on the aggregated information distribution obtained from the signatures in the graph. Secondly, a self-tuning method is proposed for merging records that are of high similarity. This can effectively reduce the impact from noises. The experiments on real datasets show that the proposed InfoSigs algorithm outperforms the conventional algorithms, such as I-Match and Shingling, with average improvements of 21.3% in terms of the F-Measure. The results indicate that InfoSigs is able to effectively generate more fine-grained clustering results compared with the conventional methods.
XCluster: A Cluster-Based Queriable Multi-Document XML Compression Method
Zhao Ming, Luo Jizhou, Li Jianzhong, and Gao Hong
2010, 47(5):  804-814. 
Asbtract ( 452 )   HTML ( 0)   PDF (1633KB) ( 419 )  
Related Articles | Metrics
XML is the de facto standard for data exchange and data storage in network applications. The main problem in the management of XML data is the redundancy caused by its mingling structure and data, which causes high costs in storing, exchanging and processing of XML data. Data compression techniques can be used to reduce such redundancy. However, most of the existing XML compression methods only try to reduce the redundancy in each single XML document, while ignoring the redundancy among XML documents. Presented in this paper, is a new XML compression method XCluster, which utilizes the similarity among XML documents. Queries can be evaluated on the compressed XML documents generated by XCluster directly. XCluster uses the improved pq-gram approximate distance between root-ordered tag trees to cluster the input XML documents hierarchically first. Then it compresses the structures in each clustered subset of XML documents by obtaining a representative structure through merging operation. Finally, it puts data of nodes with same tags into same buckets and encodes data in each bucket with a suitable algorithm according to the type of data. Extensive experiments on both real datasets and synthetic datasets show that XClutster outperforms XGrind and XQilla in both compression ratio and efficiency of query processing.
An Enhancement Algorithm of Cluster Boundaries Precision Based on Grid's Density Direction
Yu Canling, Wang Lizhen, and Zhang Yuanwu
2010, 47(5):  815-823. 
Asbtract ( 434 )   HTML ( 1)   PDF (1142KB) ( 489 )  
Related Articles | Metrics
The grid-based clustering approach uses a multi-resolution grid data structure. It quantizes the object space into a finite number of cells that form a grid structure on which all of the operations for clustering are performed. Existing grid-based clustering algorithms are efficient, but the clustering quality is not very good, especially when dealing with the objects in fringes, the clustering results are not accurate. In order to resolve such problems, a preprocess algorithm based on grid density direction is proposed in this paper. The method is derived from Newton's universal law of gravitation, that is, the smaller the distance between objects, the larger their quality, the more attractive. Similarly, the density inside a cluster is larger than its boundary. That is to say that there is larger gravitation inside a cluster. Therefore, if a grid's density increases at the opposite directions synchronously (that is the case of the extrusion), the grid need to be further refined, which is to determine whether the grid is the edge of cluster grids, and determine the extrusion directions of the objects in the edge of cluster grids. The experimental results show that the new method can enhance cluster boundaries precision effectively and has a higher cluster recognition rate, so it is very useful as a preprocess algorithm of a clustering.
Multi-Schema Integration Based on Usage and Clustering Approach
Ding Guohui, Wang Guoren, and Zhao Yuhai
2010, 47(5):  824-831. 
Asbtract ( 467 )   HTML ( 1)   PDF (893KB) ( 427 )  
Related Articles | Metrics
Data integration is an effective solution to the problem of multiple data sources consolidation. It is of great importance to integrate schemas of multiple data sources accurately and efficiently. Although there have been a large number of researches on schema integration, they all neglect the history usage information of user which is a very important factor for improving the quality of schemas integration. In this paper, a novel clustering-based multi-schema integration method is proposed, which takes advantage of the usage information of the user. Firstly, a feature vector is extracted for each attribute of source schemas from the query log of a database, over which clustering is performed. Secondly, according to minimal difference among resulting clusters, a maximal similarity threshold is introduced to detect all intra-cluster exceptional points of different semantics for each resulting cluster. The points are departure core exceptional point, same source exceptional point, and excursion exceptional point respectively. Finally, aiming at three kinds of exceptional attributes within a resulting cluster, three exceptional points eliminating rules are proposed respectively, based on which a novel exceptional points eliminating algorithm, namely EPKO, is designed. Experimental results show that the proposed method can solve the problem of multiple schemas integration accurately and efficiently.
(中国人民大学信息学院 北京 100872) (cadizhou@gmail.com)
HF-Tree: An Update-Efficient Index for Flash Memory
2010, 47(5):  832-840. 
Asbtract ( 602 )   HTML ( 0)   PDF (1399KB) ( 390 )  
Related Articles | Metrics
(Information School, Renmin University of China, Beijing 100872)
Efficient XML Keyword Query Refinement with Meaningful Results Generation
Huang Jing, Lu Jiaheng, and Meng Xiaofeng
2010, 47(5):  841-848. 
Asbtract ( 496 )   HTML ( 2)   PDF (1001KB) ( 406 )  
Related Articles | Metrics
search method provides users with a friendly way to query XML data, but a users keyword query may often be an imperfect description of their intention. Even when the information need is well described, a search engine may not be able to return the results matching the query as stated. The task of refining the users original query is first defined to achieve better result quality as the problem of keyword query refinement in XML keyword search, and guidelines are designed to decide whether query refinement is necessary. Four refinement operations are defined, namely term deletion, merging, split and substitution. Since there may be more than one query refinement candidates, the definition of refinement cost is proposed, whic,h is used as a measure of semantic distance between the original query and refined query, and also proposed is a dynamic programming solution to compute the refinement cost. In order to achieve the goal of finding the best refined queries and generate their associated results within a one-time node list scan, a stack-based algorithm is proposed, followed by a generalized partition-based optimization, which improves the efficiency a lot. Finally, extensive experiments have been done to show efficiency and effectiveness of the query refinement approach.
A Multi-Order Based Index Structure for Spatial Data—MOIS-tree
Liu Runtao and Hao Zhongxiao,
2010, 47(5):  849-857. 
Asbtract ( 459 )   HTML ( 0)   PDF (647KB) ( 414 )  
Related Articles | Metrics
An index structure, MOIS-tree for spatial data, is proposed by combining the division for data space with B-tree and R-tree at the aim of improving query efficiency, which is a brand new way to process range query. The definitions of the four kinds of orders, in which spatial data are ordered according to their MBRs, are given. Based on the orders, the definition of MOIS-tree is given. In the MOIS-tree the children nodes of each middle node are ordered according to their geometric locations so that the position can be located effectively to find queried results quickly when range query is processed in a middle node. Besides, the check of query window's containing a middle node in the range query algorithm of new index structure is introduced to reduce a great number of noneffective intersection tests in general query algorithms. Thus the query efficiency is achieved greatly in another aspect. The algorithms for constructing a MOIS-tree and node insertion, and the proofs of the algorithms' correctness and termination are presented and their time complexities are given. Finally, the algorithm for range query is obtained and the analysis for its properties is condacted. The experimental results show that the speed of range query on MOIS-tree is greatly improved.
D-EEM: A DOM-Tree Based Entity Extraction Mechanism for Deep Web
Kou Yue, Li Dong, Shen Derong, Yu Ge, and Nie Tiezheng
2010, 47(5):  858-865. 
Asbtract ( 574 )   HTML ( 6)   PDF (1316KB) ( 595 )  
Related Articles | Metrics
With the increase of Web databases, accessing Deep Web is becoming the main method to acquire information. Because of the large-scale unstructured content, heterogeneous result and dynamic data in Deep Web, there are some new challenges for entity extraction. Thus it is important to solve the problem of extracting the entities from Deep Web result pages effectively. By analyzing the characteristics of result pages, a DOM-tree based entity extraction mechanism for Deep Web (called D-EEM) is presented to solve the problem of entity extraction for Deep Web. D-EEM is modeled as three levels: expression level, extraction level, collection level. Therein the components of region location and semantic annotation are the core parts to be researched in this paper. A DOM-tree based automatic entity extraction strategy is performed in D-EEM to determine the data regions and entity regions respectively, which can improve the accuracy of extraction by considering both the textual content and the hierarchical structure in DOM-trees. Also based on the Web context and co-occurrence, a semantic annotation method is proposed to benefit the process of data integration effectively. An experimental study is proposed to determine the feasibility and effectiveness of the key techniques of D-EEM. Compared with various entity extraction strategies, D-EEM is superior in the accuracy and efficiency of extraction.
Processing XPath over F&B-Index
Wang Hongqiang, Li Jianzhong, and Wang Hongzhi
2010, 47(5):  866-877. 
Asbtract ( 357 )   HTML ( 0)   PDF (2262KB) ( 335 )  
Related Articles | Metrics
XML is widely used as a standard of information exchange and representation. Queries on XML can retrieve a subset of XML data nodes satisfying certain constraints. Queries on large XML data are usually processed in two steps: 1. An index of XML nodes is created; 2. Queries are processed on that index without accessing XML data. XML index provides high efficiency for XML query processing. Particularly, F&B-index is the smallest index that supports twig query processing. However, few researches are proposed on how to efficiently create F&B-Index and how to process queries based on F&B-Index. Proposed in this paper is a new labeling scheme called prime sequence. This labeling scheme helps not only on creating an F&B-Index but also on efficient query processing. With prime sequence labeling, an F&B-index can be created by parsing the XML document only once with a SAX parser. Further, region encoding and CCPI on F&B-Index can be created during the creation of F&B-Index. Thus, TwigStack algorithm and related-path-join algorithm can be used to process queries on created F&B-Index. Also proposed is an efficient algorithm named division match over F&B-Index. The algorithm can efficiently judge relationship between two nodes based on a property of prime sequence labeling. Experiments show that prime sequence labeling provides high efficiency on creating F&B-Index and high efficiency on query processing on different datasets.
Research on Simulative Column-Storage Model Policy Based on Row-Storage Model
Yu Lisheng, Zhang Yansong, Wang Shan, and Zhang Qian
2010, 47(5):  878-885. 
Asbtract ( 407 )   HTML ( 2)   PDF (1402KB) ( 679 )  
Related Articles | Metrics
Column-storage model has outstanding performance in read-only data warehouse applications. Many researches show that, for typical OLAP (online analytical processing) queries, column-storage database has better performance than traditional row-storage database. In this paper, according to the characters of column-storage model and its particular data processing pattern, the authors propose a simulative column-storage model based on the traditional row-storage relational model. In the simulative column-storage model, they reorganize and then store each column of all the original tables as a new independent relational table, and provide a data processing model similar to existing column-storage data processing to reduce I/O cost of query processing. Additionally, they also propose optimized simulative column-storage models based on clustering relative columns and full indexed model respectively for improving the performance of OLAP queries especially. Experiments among five data storage models show that, the simulative column-storage model has good performance both in data accessing and in OLAP query processing, and the overall performance is between traditional row-storage model and existing column-storage model. However, in real applications, users may need large additional investment for deploying an almost new system of existing column-storage model to improve the query performance, and the simulative column-storage model based on the traditional row-storage relational model can distinctly reduce the cost of system redeployment. Specially, it provides better performance for OLAP applications.
SLCA Algorithm for XML Streams Based on Hole-Filler Model
Huo Huan, Wang Guoren, Chen Qingkui, and Peng Dunlu
2010, 47(5):  886-892. 
Asbtract ( 493 )   HTML ( 0)   PDF (1159KB) ( 471 )  
Related Articles | Metrics
Unlike in traditional databases, queries on XML streams are bounded not only by memory but also by real time processing. A novel technique for keyword search over streamed XML fragments is presented, which adopts broadcast model and hole-filler model for XML fragments dissemination, addressing the problem of disordered fragment transmission and considering the quality of searching results due to either keyword mismatch or data absence. Two efficient indexes for candidate elements are developed to further improve the performance: Hierarchical hash table and LCA table. The former indexes structure keywords which act as the structure of result, while the latter indexes the condition keywords which refine the keyword search condition. SLCA computing algorithm, which is triggered by condition keywords, only computes the candidate fragments that involve keywords, thus avoiding redundant operations that will not contribute to the final result. The algorithm produces part of the matched answers continuously without having to wait for the end of the stream. The experiments evaluate the performance of the SLCA algorithm with different types of keywords, different document fragmentation and different keyword frequencies, and compare the SLCA algorithm with other XML keyword matching algorithms. The experiment study shows that the SLCA algorithm performs well on saving processing power and memory space.
Anomaly Detection over Pseudo Period Data Streams Based on DTW Distance
Cheng Wencong, Zou Peng, Jia Yan, and Yang Yin
2010, 47(5):  893-902. 
Asbtract ( 720 )   HTML ( 0)   PDF (1057KB) ( 556 )  
Related Articles | Metrics
Pseudo period data streams appear in a lot of applications, especially in monitoring domains. The anomalies detected over pseudo period data streams may possess significant domain knowledge which is worth to do further analysis. When Euclidean distance between time series changes greatly with the compared time series moving slightly along the time-axis, DTW (dynamic time warping) distance is suggested as a more robust distance than Euclidean distance. In this paper DTW distance is adopted as similarity measure of different wave sections in pseudo period data streams, and then the anomaly wave sections are defined, which have few historical similar counterparts based on that similarity measure. A nave algorithm is given to detect the anomaly wave sections by directly computing the DTW distance between the current wave section and all other wave sections in the historical dataset. However, the efficiency of the nave algorithm is very poor which limits its application. So a fast approximate algorithm based on the cluster index is proposed to speedup the nave method. Compared with the nave algorithm, this new method is much faster in speed and no big degrades in accuracy. Extensive experiments on the real dataset demonstrate the effectiveness of the proposed methods.
Research on Network Bidirectional Topology Discovery Based on Measurer by Spreading
Jiao Jian,Yao Shan, and Li Xiaojian,
2010, 47(5):  903-910. 
Asbtract ( 427 )   HTML ( 0)   PDF (1835KB) ( 436 )  
Related Articles | Metrics
The characteristics of route protocol and access control in computer network make the topology have some phenomena such as single-direction and asymmetry. For these reasons, some links can not to be found during the topology discovery. In order to resolve this issue, a bidirectional topology discovery protocol (BTDP) is proposed, which is based on measurer spreading and sub/pub mechanism. The measurers who want to make use of this protocol can probe for other's links which can not be discovered by oneself from destination to source. An automaton model of this protocol is created. Analyzing and validating the model prove that the protocol's computability can be end in logic. Further we give the algorithms for the protocol, including the main part which finishes the process between source and destination, and the fusion part which is to judge the links whether they are the same and to fuse them. Finally the algorithm is implemented with program. Through the plus of QQ instant communication software, a group can be deployed to implement the topology discovery. The experiment in which program is running on the Internet of China show that BTDP can find the asymmetrical paths in this network. On the other hand, the fusion data coming from multi-measurers reveal some hide-links in the above environment.
An Efficient and Secure Group Key Management Scheme in Mobile Ad Hoc Networks
Wang Gang, Wen Tao, Guo Quan, and Ma Xuebin
2010, 47(5):  911-920. 
Asbtract ( 397 )   HTML ( 0)   PDF (1004KB) ( 556 )  
Related Articles | Metrics
The design of group key management schemes and protocols, whose main objective is to provide secure and reliable communication, is one of the hot topics in the basic research field of secure mobile ad hoc network. However, existent group management schemes are not suited to MANET due to its intrinsic properties such as being dynamic, node resource constraints and no fixed infrastructure. In order to overcome the drawback, an efficient and secure group key management scheme (ESGKM) is proposed in this paper. ESGKM does not require a trusted dealer and only runs an interactive protocol to generate group sharing secret key among n parties. The scheme can adapt to topological change automatically and increase the security of the protocol. The application of ECC and bilinear pairing improves the performance of group key generation algorithm and the verification of the shares of sub-secret and group secret further enhances the security of the protocol. In the scheme, a group rekeying and group key consistency management algorithm based on group key service center (GKSC) is also proposed, which can reduce communication and computation overheads of the protocol effectively the and avoid the problem of isolated nodes caused by the group key inconsistency. Strand spaces model is used to prove the correctness and security of ESGKM. The performance analysis results show that the proposed scheme can reduce effectively resource cost, adapt the characteristics of MANET and is clearly superior to the existing BD, A-GDH and TGDH protocols.
A Malicious Transaction Detection Method Based on Transaction Template
Dai Hua, Qin Xiaolin, and Bai Chuanjie
2010, 47(5):  921-929. 
Asbtract ( 373 )   HTML ( 1)   PDF (829KB) ( 368 )  
Related Articles | Metrics
Malicious transaction detection technique is one of important issues in database intrusion detection area. Immediate detection of the malicious transactions is the basis for building a survivable database system. Based on the study of existing malicious transaction detection methods, a novel detecting mechanism based on the database transaction template is proposed. First, fine-grained SQL statement feature vector is defined. The vector contains logical structure of condition clause by expanding the analysis granularity on SQL statements. Second, database transaction template is proposed which has two aspects: one is the SQL statements directed graphs, which contain the transaction's SQL statements feature vectors and the executing sequence of database operations, the other is execution environment constraints, which represent the transaction's execution requirements, such as time constraints, location constraints, operational constraints, etc. Finally, a malicious transaction detection algorithm based on database transaction template is provided, which integrates the virtues of the template and is based on a decision algorithm called template support. To validate the effectiveness of the proposed detection method, experiments on transaction executing performance, various detection types and malicious transaction detection rates are performed. Experimental results indicate that the proposed method has good detection performance and ability, and can be applied in wider detection scopes.
Distance-Based Outlier Detection for Distributed RFID Data Streams
Liao Guoqiong, and Li Jing
2010, 47(5):  930-939. 
Asbtract ( 456 )   HTML ( 0)   PDF (1461KB) ( 464 )  
Related Articles | Metrics
Recently, RFID technologies have been widely used in many fields, such as real-time monitoring, object identification and item tracing, so it is very important to detect abnormal states of tagged objects in time. However, due to various environmental factors and unreliability of wireless communication technology, the data collected by the RFID readers are often noisy. According to the characteristics of distributed RFID data stream, such as huge-volume, variability, unreliability and distribution, the authors propose a distance-based local stream outlier detection algorithm (LSOD) and an approximate estimate-based global stream outlier detection algorithm (GSOD). LSOD need maintain a data stream structure, CSL, to identify the safe inlier. With the help of the characteristics of safe inliers, LSOD not only can reduce the memory space of stream data, but also can save the query time on stream objects. Under the distance-based outlier definition, it is a fact that global outlier on center node is a subset of the union of local outliers on every distributed node. Thus, GSOD uses a sample-based method to estimate approximately the global outliers for reducing the communication volume and calculate load of the centre node. Finally, several experiments have been accomplished, confirming that the proposed algorithms have the characteristics such as short running time, small memory space and high accuracy.
A Novel Rule-Centered Fuzzy Model Induced by Epanechnikov Mixtures
Zhang Qinli, and Wang Shitong
2010, 47(5):  940-947. 
Asbtract ( 578 )   HTML ( 2)   PDF (925KB) ( 381 )  
Related Articles | Metrics
This work explores how mixture model based on Epanechnikov kernel functions, termed here as Epanechnikov mixture model (EMM), can be translated into a rule-centered generalized fuzzy model (RCGFM) with multidimensional Epanechnikov membership functions that take into consideration of the correlation among components of sample data and therefore result in a more effective partition of the input space. Thus, the new fuzzy model can be interpreted probabilistically, i.e., the conditional means of an EMM corresponds to the defuzzified output of a RCGFM and the coefficients represented by the means and/or variances in probability theory in the consequent polynomials of fuzzy rules can be exactly interpreted as Taylor series coefficients. The differential theory is combined with the fuzzy theory. Furthermore, the proposed model yields a smaller number of fuzzy rules and inputs, because the compactly supported polynomial Epanechnikov membership functions minimize the asymptotic mean integrated squared error with the optimal kernel efficiency. As demonstrated by the experimental results in several benchmarking datasets, the EMM linked RCGFM as the fuzzy model EMM-RCGFM is superior to other fuzzy models in modeling accuracy and robustness, besides generating a small set of probabilistically interpretable fuzzy rules. This work represents a very first attempt to exhibit the applicability of Epanechnikov kernel functions as fuzzy membership functions.
Research on Cnvergence of Multi-Robots Path Planning Based on Learning Classifier System
Shao Jie, Yang Jingyu, Wan Minghua, and Huang Chuanbo
2010, 47(5):  948-955. 
Asbtract ( 507 )   HTML ( 1)   PDF (1533KB) ( 505 )  
Related Articles | Metrics
Learning classifier systems(LCS) are rule-based inductive learning systems that have been widely used in the field of reinforcement learning over the last few years, but seldom used in the multi-robots domain. In this paper a distributed learning classifier system, which combines reinforcement learning and genetic algorithm to create a set of rules on-line, is used to design optimal paths for multi-robots path planning. Due to premature convergence, local optimal solution, needing a larger storage space and other shortcomings of genetic algorithms,and targeted at the different effects of the static and dynamic environment, the authors design different fitness function in static and dynamic environment. They have derived and proved that the credit allocation algorithm is convergent and provides a theoretical guarantee for the path planning algorithms. Simulation results also show that the genetic algorithm and learning classifier system combination for multi-robots path planning is effective. Premature convergence, local optimal solution, needing a larger storage space and other shortcomings of the genetic algorithm have been significantly improved. The proposed new approach has increased multi-robots' ability to quickly find safe paths. So LCS has a very broad application prospects in the field of robotics and also is the future research directions.
A Storage Optimization Method for Frequency Direct Digital Synthesizer
Zhang Kehuan, Ren Xiaoxi, Li Renfa, and Ling Chunqing
2010, 47(5):  956-961. 
Asbtract ( 400 )   HTML ( 2)   PDF (880KB) ( 432 )  
Related Articles | Metrics
Direct digital synthesis (DDS) is one type of advanced frequency synthesizing technology which uses the sine phase-amplitude mapping table stored in the read-only memory (ROM) as its core component to convert the phase value into amplitude value. However, this look-up table is large and will occupy lots of precious on-chip transistor resources. In order to reduce the consumption of on-chip memory resource, proposed in this paper is a new and optimized mapping scheme which stores only the amplitude difference between two adjacent phase sampling points instead of the full values, and uses an accumulator to add this amplitude difference with the amplitude at previous phase point. Analysis results show that the new scheme uses only twenty percent of the resources used by current technology. The implementation technology is discussed and finally a prototype is built on FPGA to demonstrate the feasibility and the performance. The max frequency is also evaluated on the prototype system. The evaluation results show that: when compared with the traditional mapping and storage scheme, the proposed new scheme can save a large amount of on-chip RAM resource, and is able to achieve the same maximum working frequency, but with a minimum cost by requiring only a little more logical resource.
Dual-Layer CRFs Based on Subword for Chinese Word Segmentation
Huang Degen, Jiao Shidou, and Zhou Huiwei
2010, 47(5):  962-968. 
Asbtract ( 604 )   HTML ( 1)   PDF (727KB) ( 577 )  
Related Articles | Metrics
A subword based dual-layer CRFs (conditional random fields) method for Chinese word segmentation is proposed, which aims to solve the problem of word segmentation disambiguation and unknown words recognition. Previous work in CRFs reported that the subword-based tagging outperforms the character-based tagging in all comparative experiments. However, subwords-based tagging often produces errors of cross word boundaries. This method is established on sequence labeling methods based on subwords, which are selected with a subword filtering algorithm. The learning process is divided into two: one for learning the first layer subword tagging CRF with character-based tagging, and the other for learning the second layer word tagging CRF with subword-based tagging. In word sequence labeling process, the first layer uses subword tagging CRFs model to recognize the subwords in testing corpora for reducing error rate generated by label spanning, and the second layer is used to subword-based sequence labeling and then to test the output of first layer to get the final result. The proposed method is evaluated using test data from SIGHAN Bakeoff 2006. F-score of 93.3% and 96.1% are achieved respectively in UPUC corpora and MSRA corpora. The experimental results show that this method can gain state-of-the-art performance on Chinese word segmentation.