ISSN 1000-1239 CN 11-1777/TP

Table of Content

15 October 2012, Volume 49 Issue 10
Research on Uncertain Skyline Query Processing Techniques
Wang Yijie, Li Xiaoyong, Yang Yongtao, Qi Yafei, and Wang Guangdong
2012, 49(10):  2045-2053. 
Asbtract ( 565 )   HTML ( 0)   PDF (1247KB) ( 611 )  
Related Articles | Metrics
Uncertain data has already widely existed in many practical applications recently, such as sensor networks, RFID networks, locationbased services, mobile object management, online shopping, and market surveillance. Uncertain skyline query, as an important aspect of uncertain data management, has received considerable attention in database and network computing fields recently, due to its importance in many practical applications, such as decision making, market analysis, and environment surveillance. Firstly, many existing definitions of skyline queries on various uncertain data, including the definitions on the discrete and continuous probability distribution model and incomplete data, have been extensively presented. Secondly, based on the analysis of the characteristics of skyline queries, various existing centralized and distributed skyline queries approaches are addressed, especially for the principles, as well as the advantages and disadvantages of the approaches. Thirdly, the skyline query definition over uncertain data streams is introduced and the existing studies on skyline queries over uncertain data streams are surveyed. Finally, based on the latest research work, some directions of the future research of uncertain skyline queries are outlined.
Estimation on Worst-Case Execution Time of Real-Time Complex Event Processing
Li Xiang, Fan Yushun, Wang Hongan, and Qiao Ying
2012, 49(10):  2054-2065. 
Asbtract ( 721 )   HTML ( 2)   PDF (2654KB) ( 502 )  
Related Articles | Metrics
Real-time complex event processing (CEP) system is used to detect complex events from primitive event stream and must guarantee that the tasks of processing events can be completed in deadline. In order to guarantee that, a key problem is how to estimate the worst-case execution time (WCET) of the CEP program in a CEP system. In current WCET estimation methods for general programs, the range of the execution number of each sub-program needs to be annotated by developers. In a CEP program, however, ranges of execution numbers of sub-programs for detection of sub- event patterns are hard to directly obtain because of the complexity of CEP program. Although execution numbers of different sub-programs have relations and ranges can be solved from these relations, these ranges are still not strict enough, which will reduce the estimation accuracy. Thus current methods cannot accurately estimate the WCET of CEP programs. This paper presents a novel WCET estimation method for CEP program. In face of annotation difficulties, constraints among execution numbers of sub-programs are annotated, instead of ranges of these execution numbers. The constraints are generated from detection structures used by the CEP program. Results of simulations indicate that the method is effective and has higher accuracy.
Partition-Based Set Similarity Join
Rong Chuitian, Xu Tianren, and Du Xiaoyong,
2012, 49(10):  2066-2076. 
Asbtract ( 651 )   HTML ( 0)   PDF (2428KB) ( 516 )  
Related Articles | Metrics
Set similarity join is a primitive operation applied in a variety of applications. Its goal is to find all record pairs whose similarity is not below the given threshold in a dataset, based on set similarity constraints. It has become a popular topic and attracted much attention from database community. As the recent proliferation of social networks, mobile applications and online services increase the rate of data gathering, which brings new challenges to set similarity join. In this paper, we propose a brand-new partition based set similarity join method. In this method, each set is partitioned adaptively into even partitions based on pigeon-hole principle. The partitions are used to filter out more false positives efficiently. In order to filter out more false positives and improve the efficiency, the enhanced method is proposed applying the position of the partition. The extensive experiments are carried out on two real datasets with different settings. The results demonstrate that the set similarity join applying our partition based filtering method is more efficient than state-of-the-art methods.
Query Results Authentication of Outsourced Append-Only Databases
Wen Tao, Sheng Gang, Guo Quan, and Sheng Guojun
2012, 49(10):  2077-2085. 
Asbtract ( 567 )   HTML ( 1)   PDF (1601KB) ( 557 )  
Related Articles | Metrics
In the scenario of database outsourcing, database management is outsourced to the professional third-party, and one of the critical problems is integrity authentication of the query results. As far as we are concerned, we are the first to propose the problem of outsourced append-only database. According to the features of outsourced append-only database, a new authentication data structure, Min-Max Hash tree is proposed based on the current authentication data structure. It can be utilized by the clients to solve the problem of authenticating the query results. For the data owner side, a basic data forwarding algorithm is proposed; and for the service provider side, the algorithm of querying rise, fall or oscillation amplitude in price and the algorithm of query results authentication are proposed respectively for one-shot query and continuous query. Finally, optimization measures are proposed for data forwarding, storage of authenticated data structure in the data owner side and continuous query in the service provider side, by which the space of storage is diminished greatly and the whole efficiency of data processing is improved further. Extensive experiments have demonstrated the effectiveness and efficiency of the proposed Min-Max Hash tree. It can be utilized to authenticate the query results and to process the massive data.
VPM: Materialization Based on Path with Values in Column-Stores
Ding Xiangwu, Yu Wenbing, and Liu Guohua,
2012, 49(10):  2086-2094. 
Asbtract ( 474 )   HTML ( 0)   PDF (1856KB) ( 596 )  
Related Articles | Metrics
Materialization is one of the key issues for query execution in column-stores due to the fact that it has direct influence on query performence. It is important to design a set of materialization strategies and relative technologies to column stores. Existing late materialization may re-read the same data blocks. This paper proposes a materializing technology based on path with values (VPM). Firstly, a new descriptor structure, called passing block, is defined for the intermediate results during physical execution, in which the position information of values is stored separately from the values. Based on this, for a given physical query tree, all efficient paths with values from the scanned nodes or extracted nodes to the ancestor nodes are generated according to whether the ancestors need the values. In the light of the path with values, the values of the column are saved in the value area of the passing block if they are needed by the ancestor nodes, otherwise, only the position list is saved. During the query execution, the physical operations access directly data from passing block, which effectively reduces the unnecessary I/O cost. Consequently, VPM improves the performance of query execution in column stores. Experimental results on benchmark data set SSB show the effectiveness of the proposed method.
BTreeU-Topk: Binary-Tree Based Top-k Query Algorithms on Uncertain Data
Zhang Hui, Zheng Jiping, and Han Qiuting
2012, 49(10):  2095-2105. 
Asbtract ( 639 )   HTML ( 2)   PDF (2436KB) ( 510 )  
Related Articles | Metrics
Development of applications in uncertain data management area derives various kinds of queries. Among these, Top-k query has attracted considerable research attention which is an important query type and returns the most important k objects based on some characteristic values. Because of the uncertainty of application data, traditional Top-k methods and techniques could not be directly applied to query on uncertain data. In this paper, a new Top-k algorithm based on binary-tree named BTreeU-Topk is provided under existing Top-k algorithms on uncertain data. BTreeU-Topk algorithm utilizes binary-tree to simply representation of possible worlds space. To further improve the efficiency of the proposed algorithm, pruning technique is adopted and the BTreeOPTU-Topk and BTreePU-Topk algorithms are proposed. BTreeOPTU-Topk algorithm adopts optimized queue to prune branches not in final results while BTreePU-Topk algorithm utilizes pruning rule. Experimental results show that the proposed BTreeU-Topk, BTreeOPTU-Topk and BTreePU-Topk algorithms are superior to existing algorithms in two aspects: different data distributions and increase of k values. In addition, the quantity of maintained states is reduced by our proposed algorithms.
View-Tree-Based Dynamic View Selection
Lin Ziyu, Zou Quan, Lin Chen, Lai Yongxuan, and Zheng Wei
2012, 49(10):  2106-2117. 
Asbtract ( 492 )   HTML ( 0)   PDF (2371KB) ( 437 )  
Related Articles | Metrics
User-oriented materialized views are able to greatly improve OLAP query performance for users. However, the available methods for cache management are not able to deal with the issue of dynamic view selection, since they do not take into account the data access pattern of OLAP queries of specific users. In this paper, the concepts of view path and view tree are proposed to organize the views. Also, a method called reverse path growing is proposed to quickly compute view path for a newly-arrived query, so as to greatly reduce query response time. Furthermore, an effective view replacement method based on reserved view path is designed to better deal with the issue of dynamic adjusting of view tree. Extensive experiments show that the proposed method can achieve better performance than those previous ones.
On-Line Dynamic Index Hybrid Update Scheme Based on Self-Learning of Allocated Space
Liu Xiaozhu, and Peng Zhiyong
2012, 49(10):  2118-2130. 
Asbtract ( 449 )   HTML ( 0)   PDF (2297KB) ( 537 )  
Related Articles | Metrics
To improve time and space efficiencies of index maintenance, an on-line dynamic index hybrid update (ODIHU) technique is proposed based on self-learning of allocated space. Based on Zipf theorem, ODIHU appropriately estimates the number of short and long lists with theoretical analysis, and manages short and long lists with uniform storage model of distinguishing long and short lists based on link. ODIHU manages long list space with history-based adaptive learning allocation (HALA), and manages short list space with linear allocation (LA), exponential allocation (EA), and uniform allocation (UA). To decrease index and retrieval cost, ODIHU divides index data set into limited sections and controls index merge with schemes. Then ODIHU merges short lists with immediate merge, and merges long lists with improved Y-limited contiguous multiple merge scheme, which balances the trade-off of the time and space efficiencies effectively. Based on the proposed RABIF, ODIHU not only considers both index level and inverted list level updating, but also effectively improves time and space efficiencies of index updating.
A Highly Scalable RDF Data Storage System
Yuan Pingpeng, Liu Pu, Zhang Wenya, and Wu Buwen
2012, 49(10):  2131-2141. 
Asbtract ( 753 )   HTML ( 1)   PDF (1370KB) ( 650 )  
Related Articles | Metrics
As RDF(Resource Description Framework) is flexible to express and easy to interchange, the volume of RDF data is increasing rapidly. TripleBit aims to propose an efficient approach in data storage and query processing for large scale RDF data in several aspects. TripleBit employs delta compression and variable integer encoding schemes in order to reduce the storage space. The data tables are partitioned into several chunks, which not only facilitate the buffer management but also make the data more compact, therefore it can accelerate the query processing. We employ heuristic rules to generate query plan dynamically. Besides, two-stage execution strategy is used in multiple-variable query which can reduce the intermediate result. The performance evaluation is compared with the state of art RDF stores, such as RDF-3X, MonetDB. Experimental results demonstrate that TripleBit saves at least 40% storage space while the speed of query processing has been improved very much.
An Index Supporting Spatial Approximate Keyword Search on Disks
Wang Jinbao, Gao Hong, Li Jianzhong, and Yang Donghua
2012, 49(10):  2142-2152. 
Asbtract ( 608 )   HTML ( 0)   PDF (2813KB) ( 452 )  
Related Articles | Metrics
Spatial approximate keyword queries consist of a spatial condition and a set of keywords as the textual condition. The spatial condition requires that the returned objects are inside a spatial region or nearby a location, and the textual condition requires that the returned objects are labeled with a set of keywords similar to the queried keywords. Such queries enable users to find objects they are interested in within a spatial database, and make mismatches between users’ query keywords and spatial object keywords tolerant. With the rapid growth of data, spatial databases storing objects from diverse geographical regions can be no longer held in memories. Thus, it is essential to answer spatial approximate keyword queries over disk resident datasets. Existing works present methods either that return incomplete answers or index in memory, and effective solutions in disks are in demand. This paper presents a novel disk resident index RB-tree to support spatial approximate keyword queries. We study the principle of augmenting R-tree with the capacity of approximate keyword searching based on existing solutions, and store two kinds of bitmaps in R-tree nodes to build an RB-tree. RB-tree supports a wide range of spatial conditions such as range and nearest neighbor, combined with keyword similarity metrics such as edit distance, dice etc. Experimental results against R-tree on two real world datasets demonstrate the efficiency of our solution.
Social Roles Discovery of Moving Objects Based on Spatial-Temporal Associated Semantics and Temporal Entropy of Trajectories
Ma Yuchi, Yang Ning, Xie Lin, Li Chuan, and Tang Changjie
2012, 49(10):  2153-2160. 
Asbtract ( 753 )   HTML ( 0)   PDF (1875KB) ( 825 )  
Related Articles | Metrics
Existing methodologies on the measurement of trajectory similarity pay little attention to the spatial temporal semantics and temporal randomness of trajectories, and as a result they likely misclassify moving objects by their social roles. This problem, this research addresses this problem from four aspects. Firstly, this research proposes the concept of spatial temporal associated semantics (STAS) which is proportion to the probability of the evens that different moving objects pass by areas with same type at a time. Secondly, this research proposes the concept of temporal entropy which quantifies the randomness of the time instances, at which one trajectory passes by the areas with same type. Thirdly, this research proposes a new similarity measure, trajectory semantic similarity (TSS), which combines STAS and temporal entropy and captures the spatial-temporal characteristics of the social roles of trajectories. Finally, this research presents an algorithm, SRDA (social roles discovering algorithm), to cluster the trajectories based on TSS, and each resulting cluster represents a different social role. The extensive experiments conducted on real data set and synthetic data set show that SRDA improves the average accuracy by 18% with linear temporal complexity, which validates the effectiveness and efficiency of SRDA.
Fuzzy Spatio-Temporal Range Querying over Uncertain Moving Objects
Chen Yifei, Qin Xiaolin, and Li Bohan
2012, 49(10):  2161-2170. 
Asbtract ( 491 )   HTML ( 0)   PDF (2791KB) ( 507 )  
Related Articles | Metrics
Uncertainty and fuzziness are semantically different in spatio-temporal data management and related applications. We propose a novel type of query, namely fuzzy spatio-temporal range (FSTR) query over uncertain moving objects, which simultaneously integrates the location uncertainty and the users’ preferences expressed qualitatively with fuzzy conditions or items. Both the temporal and spatial searching conditions in FSTR queries are vague, namely they have no crisp boundaries. FSTR queries are executed on the uncertain datasets. To address these two kinds of indeterminate phenomena, we utilize fuzzy sets and probability density functions (pdfs) to represent fuzzy querying conditions and the possible distributions of objects’ locations respectively. We present the qualifying guarantee evaluation of objects about vague query conditions, and propose pruning techniques based on the α-cut of fuzzy set to shrink the search space efficiently. We also design rules to reject non-qualifying objects and validate qualifying objects in order to avoid unnecessary costly numeric integrations in the refinement step. The approach here makes no assumption on objects’ pdfs and is applicable to arbitrary kind of pdfs. FSTR queries can be taken as the general form of existing certain or uncertain range queries. An empirical study has been conducted to demonstrate the efficiency and effectiveness of algorithms under various experimental settings. The experiment results show that about 30%~90% objects in the query results are obtained by the proposed rules directly without costly matching degree evaluation.
Topic-relevant Region Queries in Spatial Database
Liu Junling, Yu Ge, and Sun Huanliang
2012, 49(10):  2171-2180. 
Asbtract ( 417 )   HTML ( 1)   PDF (2421KB) ( 511 )  
Related Articles | Metrics
Spatial query processing is widely applied for the location-based service and the facility selection. This paper proposes and solves a novel type of spatial queries named Topic-Relevant Region (T2R) queries, which can be used to spatial decision analysis and location selection. Given a topic T defined by a feature set R, a query window q, a T2R query retrieves k non-overlapping regions that have the highest relevance values computed by the number of feature objects and their importance. As an example, the finance topic is defined by stock exchanges, insurance companies and banks. On the topic, T2R query finds finance centers intensively distributed feature objects. We propose three efficient algorithms to process T2R, which are Baseline (BSL), Filtering-Refinement (FR) and Shrink (SHR) algorithm. SHR can obtain the best pruning performance by clustering regions with high relevance values and shrinking them compared with the other two algorithms. The proposed algorithms solve problem of sorting the regions according to their relevance values with the topic at an arbitrary location. We evaluate the efficiency of the proposed algorithms with extensive experiments on real and synthetic datasets under a wide range of parameter settings and verify the effectiveness of T2R query by the real topic query.
Threshold-Based Heuristic Algorithm for Influence Maximization
Chen Hao and Wang Yitong
2012, 49(10):  2181-2188. 
Asbtract ( 1078 )   HTML ( 1)   PDF (2067KB) ( 1044 )  
Related Articles | Metrics
Influence maximization is a problem of finding a subset of nodes in a social network that can maximize the spread of influence. This optimization problem of influence maximization is proved NP-hard. Kemple and Kleinberg proposed a natural climbing-hill greedy algorithm that chooses the nodes which could provide the best marginal influence (KK algorithm). KK algorithm has large spread of influence, but is too costly for large social network. We propose a threshold-based heuristic algorithm (TBH) for social network influence maximization based on activation threshold of nodes. The algorithm computes potential influenced nodes (PIN) according to dynamically changing activation threshold of nodes in activating process. In heuristic phase, we select the nodes with maximal PIN as seed nodes. Then, in greedy phase, we greedily select the nodes having maximal margin gain of influence as seed nodes. Our experiments demonstrate that, even without the greedy phase, the performance of our algorithm is close to that of KK algorithm, and our algorithm has relatively very short running time. The experimental results also show that our algorithm outperforms HPG with the same heuristic factor c.
Sliding Window Top-K Frequent Item Query on Uncertain Stream
Wang Shuang, and Wang Guoren
2012, 49(10):  2189-2197. 
Asbtract ( 635 )   HTML ( 1)   PDF (1462KB) ( 662 )  
Related Articles | Metrics
Frequent items detection has play an important role in many applications, such as network monitor, network intrusion detection, association rules mining, and so on. While is has been studied intensively in the field of static uncertain data, frequent items detection is still novel in the emerging uncertain data stream field. In this paper, based on the sliding window model, an efficient algorithm(sTopK-UFI) is proposed to detect frequent items on uncertain data stream. In order to reduce the computation cost, the algorithm incrementally updates the exact top-K results upon each window slide and thus eliminates the need for complete recomputation from scratch. On the other hand, according to the current data in the sliding window, items to be frequent in the future are predicated according to the Poisson distribution. By computing the upper and lower bound of the probability, the pruning rules are proposed, which can reduce the candidate set and improve the query efficiently. Extensive experiments on real and synthetic datasets show the sTopK-UFI algorithm is an effective way to solve the problem of frequent items detection on uncertain data stream; it could significantly reduce the candidate set, save the memory space, improve the execution time and meet the requirements of practical applications.
A Network Community Detection Method Based on Dynamic Model of Synchronization
Huang Jianbin, Bai Yang, Kang Jianmei, Zhong Xiang, Zhang Xin, and Sun Heli
2012, 49(10):  2198-2207. 
Asbtract ( 641 )   HTML ( 1)   PDF (2633KB) ( 593 )  
Related Articles | Metrics
A network community detection algorithm SYN, is proposed based on Kuramoto model which is a dynamic model of synchronization. Firstly, the vertices in a network are sorted according to the link densities between vertices. As a result, each vertex is projected to a one-dimensional value and the network is transformed to a vector data. During the clustering process, the data are synchronized within a local region and the data points synchronized together will be considered as a community. By enlarging the radius of synchronization, our method can detect the multi-resolution community structure of a network. Through the modularity function, our method can automatically select the optimal clustering result. Our method does not depend on any data distribution assumptions and it can detect communities of arbitrary number, size and shape in networks. The experimental results on a large number of real-world and synthetic networks show that our method achieves high accuracy.
Computing Expected Shortest Distance in Uncertain Graphs
Li Mingpeng, Zou Zhaonian, Gao Hong, and Zhao Zhengli
2012, 49(10):  2208-2220. 
Asbtract ( 481 )   HTML ( 0)   PDF (2631KB) ( 725 )  
Related Articles | Metrics
This paper focuses on the shortest distance problem in uncertain graphs, which we call expected shortest distance problem. We analyze the complexity of this problem and prove that there is no polynomial time algorithm for it. To solve this problem, we utilize random sampling methods to acquire some possible worlds of the uncertain graph, then compute the shortest distance on each and estimate expected shortest distance with the average value of the finite ones. To improve efficiency, we propose two pruning techniques, which allow us to terminate a random sampling process faster. Furthermore, considering that different sampling orders of edges do not influence the result of sampling, but will determine the number of edges to be sampled in a sampling process, we propose two sampling orders for edges to reduce the number of edges sampled in each random sampling process. Then, we propose an approximation algorithm based on random sampling using antithetic variables which is an unbiased estimator for expected shortest distance and prove that it outperforms direct random sampling in both efficiency and sampling variance, while the latter one is the key criteria for evaluating the quality of an unbiased estimator. Our experiments on real uncertain graphs of proteinprotein networks demonstrate the efficiency and accuracy of our algorithm.
Selection-Verification-Filtering: An Iterative Subgraph Containment Query Processing Strategy
Lu Jianhua, Zhang Baili, Jiang Shan, Lu Ningyun, and Wang Feifei
2012, 49(10):  2221-2228. 
Asbtract ( 506 )   HTML ( 1)   PDF (1855KB) ( 596 )  
Related Articles | Metrics
Graph data is ubiquitous in various data applications, such as chemical compounds, proteins, and social network. Effective subgraph containment query processing on large graph databases is one of the most challenging issues. The “filtering-verification” mechanism is widely used for processing subgraph containment queries. Firstly, it constructs feature-based index structures; then filters out a small set of candidates from the database with the help of indices; finally, a verification procedure is conducted on each candidate to obtain final results. The “filtering” phase is critical to getting as few candidates as possible which leads to better performance; while the “verification” phase is quite simple, there is no room to improve the overall performance by optimization in this phase. “Selection-verification-filtering”, a novel iterative three-phase subgraph containment query processing strategy is proposed, which processes the queries iteratively by utilizing the information in the “verification” phase and the graph similarity mapping relationships. Firstly, it selects one graph from the database for subgraph isomorphism verification with the query graph. If the verification fulfills, the final results are obtained directly. Otherwise, the search space of next selection is narrowed. Then, a graph selection method based on search space prediction is introduced to improve the probability of successful verification. Extensive experimental results show that the time complexity of the proposed algorithm outperforms the “filtering-verification” mechanism significantly.
An Adaptive Probability Broadcast-Based Data Preservation in Wireless Sensor Networks
Liang Junbin, and Li Taoshen
2012, 49(10):  2229-2240. 
Asbtract ( 610 )   HTML ( 0)   PDF (2194KB) ( 709 )  
Related Articles | Metrics
In some wireless sensor networks deployed in harsh environment, it may not be feasible to deploy a sink to keep connection with it. In order to prevent data loss due to exhaustion of nodes limited energy or physical damage, each node should disseminate its data to a subset of nodes in the network for storage. However, since each node only knows its local information and just has limited storage space, the processes of data dissemination and storage are hard to be controlled. An adaptive probability broadcast-based data preservation protocol, APBDP, is proposed to solve this problem. In APBDP, each node adopts an adaptive probability broadcast mechanism to disseminate its data. The mechanism can not only make all nodes receive each data packet, but also reduce the redundancy transmission of packets. Therefore, nodes energy is conserved effectively. Moreover, each node stores the data received by LT codes. After all data are stored, a collector can recover all data as long as it visits a small subset of nodes in the network. Theoretical analysis and experiments show that APBDP can achieve higher decoding performance and energy efficiency than existing protocols.
SMap: Semantically Mapping Relational Database Schemas to OWL Ontologies
Jia Cunxin, Hu Wei, Bai Wenyang, and Qu Yuzhong
2012, 49(10):  2241-2250. 
Asbtract ( 688 )   HTML ( 0)   PDF (1774KB) ( 721 )  
Related Articles | Metrics
Ontologies proliferate with the development of the semantic Web. Most data on the Web, however, are still stored in relational databases (RDBs). Creating mappings between RDB schemas and ontologies is an effective way for establishing the interoperability between them. In this paper, we propose SMap, a semantic approach to create mappings between RDB schemas and OWL ontologies. SMap consists of two main stages: finding simple mappings and learning complex mappings. In the first stage, reverse engineering rules are applied to classify the elements in an RDB schema and an ontology correspondingly into different categories, and the virtual documents for the elements are built in terms of their categories and then matched for similarities. In the second stage, based upon the pre-found simple mappings as well as some overlapped RDB records and ontology instances, the facts used for inductive logic programming (ILP) are automatically collected, which constitute the background knowledge and positive examples. Then, different types of Horn-rule-like complex mappings are learnt with a bottom-up ILP algorithm. Experimental results on real-world datasets demonstrate that, SMap outperforms existing approaches significantly on both simple mapping finding and complex mapping learning, and such Horn-rule-like mappings are of clear semantics and can be directly used for query rewriting.
KWSDS: A Top-k Keyword Search System in Relational Databases
Tang Mingzhu, Yang Yan, Guo Xuequan, Shen Zhonghui, and Zhong Yingli
2012, 49(10):  2251-2259. 
Asbtract ( 643 )   HTML ( 1)   PDF (1774KB) ( 607 )  
Related Articles | Metrics
Keyword search technology over relational databases has become one of hot topics in the field of information retrieval. It can provide the users with little SQL knowledge a simple and friendly interface. But the algorithms of some existing keyword search systems are mainly based on database graph or schema graph. However, the efficiency of them which use database graph or schema graph separately is low. The accurate rate of results is also not high. This paper devises and implements a top-k keyword search system KWSDS (keyword search system based on database graph and schema graph). After users input the keywords, it can eliminate some dirty keywords through pre-processing. The method of combining database graph and schema graph together to solve the problem of keyword search is proposed for the first time. This paper also devises the search algorithms between same table and different tables, prove the correctness of the algorithms and analyze time complexity of them. A sorting method based on relevance is proposed at the same time. The algorithms of KWSDS system run in shorter time than the existing algorithms, the results are output by KWSDS system with high accuracy. The system has excellent query performance. Finally, KWSDS is verified by abundant experiments.
An Implementation of Attributive Predicate Lock in Database System
Shou Lidan, Hu Wei, Luo Xinyuan, Chen Ke, and Chen Gang
2012, 49(10):  2260-2270. 
Asbtract ( 718 )   HTML ( 0)   PDF (3243KB) ( 502 )  
Related Articles | Metrics
In todays application scenarios of OLTP database, transactions are generally made up of simple queries, especially ones based on primary keys. In such cases, the predicate lock based on semantics of simple comparisons can be used to solve the problem of Non-repeatable read and phantom read, which guarantees the serializable schedule of transactions, leaving out complex logical judgments. In consideration of the advantages of predicate lock, we propose an extended version called Attributive Predicate Lock, which supports modifications on different attribute domains in the same data row, thus improve the concurrent capabilities of transactions. We discuss the feasibility of this theory in the given complexity and OLAP application scenarios, and give a detailed implementation of such a lock system. Based on the theory and implementation, we also conduct a simulation of TPC-C concurrent transactions on the framework of transaction threads in the national ShenTong Database System, and compared its performance with that of physical lock system. The experiment results prove that Attributive Predicate Lock can schedule transactions correctly with better performance of CPU and memory in the scenario of simple queries and updates on fixed attribute columns, which will be of great improvement to the concurrent capabilities in OLAP applications.