Advanced Search
    Zhang Kun and Zhu Yangyong. Sequence Pattern Mining Without Duplicate Project Database Scan[J]. Journal of Computer Research and Development, 2007, 44(1): 126-132.
    Citation: Zhang Kun and Zhu Yangyong. Sequence Pattern Mining Without Duplicate Project Database Scan[J]. Journal of Computer Research and Development, 2007, 44(1): 126-132.

    Sequence Pattern Mining Without Duplicate Project Database Scan

    • Sequence pattern mining has broad applications in the analysis of Web click streams, the prediction of disasters and the pattern discovery of DNA and protein sequences. PrefixSpan, which is based on frequent pattern growth approach, is currently one of the fastest algorithms towards this target. However, PrefixSpan will produce huge amount of duplicated project databases in mining dense data sets and long sequence patterns. In order to overcome this drawback, a random algorithm named SPMDS is proposed. The algorithm avoids scanning duplicated project databases by checking evidences computed by exercising one way hash function such as MD5 to pseudo projections of project databases, and also improves its performance by simplifying the search in the project tree using some necessary conditions. Both experiments and analyses show that SPMDS is better than PrefixSpan.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return