Abstract:
A first-horizontally-last-vertically scanning database sequential pattern mining algorithm (HVSM) based on large-itemset reuse is presented in this paper. The algorithm redefines the length of sequential pattern, which increases the granularity of mining sequential pattern. While considering a database as a vertical bitmap, the algorithm first extends the itemset horizontally, and digs out all the large-itemsets which are called one-large-sequence itemset. Then the algorithm extends the sequence vertically, and takes each one-large-sequence itemset as a “container” for mining k-large-sequence, and generates candidate large sequence by means of taking brother-nodes as child-nodes, and counts the support by recording the 1st-TID. The experiments show that the HVSM can find out frequent sequences faster than the SPAM algorithm for mining the medium-sized and large transaction databases.