Abstract:
Materialized views can be used to reduce the expensive network transfer cost and improve the query efficiency significantly in a Web data integration system. How to select queries to materialize under space constraints, while at the same time maximizing the benefit of materialized views, becomes a fundamental problem. Traditional methods don't take the containment relationship among massive XML queries into account; hence the selected materialized views may contain redundant information. A new model and methods are proposed to overcome those problems. The contributions include (1) a QC (query containment) model to describe massive queries set in the Web data integration system, which captures the most common relationship (containment relationship) among the queries; (2) a method to select views from the queries set to materialize based on the QC model. This method considers the key related factors in the process of the view selection, including query frequency, query space cost, query rewriting capability and query result completeness, and proposes query bitmaps to organize the materialized views, thus generating a more reasonable views selection plan. Experimental results illustrate the validation of the method.