Abstract:
Unlike in traditional databases, queries on XML streams are bounded not only by memory but also by real time processing. A novel technique for keyword search over streamed XML fragments is presented, which adopts broadcast model and hole-filler model for XML fragments dissemination, addressing the problem of disordered fragment transmission and considering the quality of searching results due to either keyword mismatch or data absence. Two efficient indexes for candidate elements are developed to further improve the performance: Hierarchical hash table and LCA table. The former indexes structure keywords which act as the structure of result, while the latter indexes the condition keywords which refine the keyword search condition. SLCA computing algorithm, which is triggered by condition keywords, only computes the candidate fragments that involve keywords, thus avoiding redundant operations that will not contribute to the final result. The algorithm produces part of the matched answers continuously without having to wait for the end of the stream. The experiments evaluate the performance of the SLCA algorithm with different types of keywords, different document fragmentation and different keyword frequencies, and compare the SLCA algorithm with other XML keyword matching algorithms. The experiment study shows that the SLCA algorithm performs well on saving processing power and memory space.