Abstract:
The online use of tertiary storage system provides a costly and feasible scheme for massive data management. In order to extend database system to manage data on tertiary storage, the relational operators, especially the join operation, are one of the key problems that must be resolved. An efficient tertiary join algorithm is presented. Experimental results show that the join algorithm is better than previous ones in performance and scalability. The join algorithm can greatly reduce the I/O cost compared with previous ones. When the data amount is huge, the performance of the join algorithm is even better than that of disk join. The result of the paper shows that tertiary storage can be used to manage massive data as well as disks, solving the key problem of storing and querying massive data.