一种基于大项集重用的序列模式挖掘算法

宋世杰  胡华平  周嘉伟  金士尧

一种基于大项集重用的序列模式挖掘算法

宋世杰胡华平周嘉伟金士尧

A Sequential Pattern Mining Algorithm Based on Large-Itemset Reuse

Song Shijie, Hu Huaping, Zhou Jiawei, and Jin Shiyao

摘要

摘要: 在重新定义序列模式的长度、增加了序列模式的挖掘粒度的基础上，提出一种基于大项集重用的序列模式挖掘算法HVSM. 该算法采用垂直位图法表示数据库，先横向扩展项集，将挖掘出的所有大项集组成一大序列项集，再纵向扩展序列，将每个一大序列项集作为“集成块”，在挖掘k大序列时重用大项集.并以兄弟节点为种子生成候选大序列，利用1st-TID对支持度进行计数.实验表明，对于大规模事务数据库，该算法有效地提高了挖掘效率.

Abstract: A first-horizontally-last-vertically scanning database sequential pattern mining algorithm (HVSM) based on large-itemset reuse is presented in this paper. The algorithm redefines the length of sequential pattern, which increases the granularity of mining sequential pattern. While considering a database as a vertical bitmap, the algorithm first extends the itemset horizontally, and digs out all the large-itemsets which are called one-large-sequence itemset. Then the algorithm extends the sequence vertically, and takes each one-large-sequence itemset as a “container” for mining k-large-sequence, and generates candidate large sequence by means of taking brother-nodes as child-nodes, and counts the support by recording the 1st-TID. The experiments show that the HVSM can find out frequent sequences faster than the SPAM algorithm for mining the medium-sized and large transaction databases.

HTML全文

参考文献(0)

施引文献

资源附件(0)