Abstract:
Mining frequent sequential patterns from sequence databases has been a central research topic in data mining and various efficient algorithms for mining sequential patterns have been proposed and studied. Recently, many researchers have not focused on the efficiency of sequential patterns mining algorithms, but have paid attention to how to make users understand the result set of sequential patterns easily, due to the huge number of frequent sequential patterns generated by the mining process. In this paper, the problem of compressing frequent sequential patterns is studied. Inspired by the ideas of compressing frequent itemsets, an algorithm, CFSP (compressing frequent sequential patterns), is developed to mine a few representative sequential patterns to express all the information of all frequent sequential patterns and eliminate a large number of redundant sequential patterns. The CFSP adopts a two-steps approach: in the first step, all closed sequential patterns as the candidate set of representative sequential patterns are obtained, and at the same time most of the representative sequential patterns are obtained; in the second step, finding the remaining representative sequential patterns takes only a little time. An empirical study with both real and synthetic data sets proves that the CFSP has good performance.