Abstract:
The management of DBMS on tertiary storage is becoming more and more important with the development of applications, not only because tertiary devices are used to archive data, but also the amount of data that application has to deal with is increasing rapidly. The cost model and query optimization method of current disk based database management system cant deal with massive data on tertiary storage. A cost model which can evaluate relational operations for tertiary resident data is proposed. The cost of various relational operations can be deduced through the cost definitions of several basic data accessing pattern and the costs of two pattern combination operators. To further improve query efficiency, multiple relation copies are stored on the tertiary storage with different organization methods. The cost model can also evaluate the cost of the same relational operation on different relation copies. Two query optimization methods are also proposed, which can not only choose the most efficient implementation algorithm for relational operators, but also choose the most I/O efficient copy of the relation on tertiary storage. The experimental results show that query efficiency for tertiary resident data can be greatly improved by adapting the proposed cost model and the query optimization methods. The introduction of relation copies demonstrates the feasibility of improving query efficiency at the cost of using more storage space.