三级存储系统中一种高效的连接算法

An Efficient Join Algorithm for Data on Tertiary Storage

摘要: 第3级存储器的联机使用为海量数据管理提供了一种廉价可行的方案.为了使数据库管理系统能够联机使用第3级存储设备，第3级存储设备上的关系操作算法，特别是连接操作算法是必须解决的关键问题之一.提出一种高效的连接算法.实验结果表明，该算法无论在性能方面还是在扩展性方面都优于以往算法，极大地减少了I/O代价.当数据量较大时，算法的性能不低于基于磁盘的连接算法.结果表明，第3级存储器可以像磁盘一样在海量数据库系统中联机使用，解决海量数据库存储和联机查询等关键问题.

Abstract: The online use of tertiary storage system provides a costly and feasible scheme for massive data management. In order to extend database system to manage data on tertiary storage, the relational operators, especially the join operation, are one of the key problems that must be resolved. An efficient tertiary join algorithm is presented. Experimental results show that the join algorithm is better than previous ones in performance and scalability. The join algorithm can greatly reduce the I/O cost compared with previous ones. When the data amount is huge, the performance of the join algorithm is even better than that of disk join. The result of the paper shows that tertiary storage can be used to manage massive data as well as disks, solving the key problem of storing and querying massive data.