• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Zhang Peng, Duan Lei, Qin Pan, Zuo Jie, Tang Changjie, Yuan Chang’an, Peng Jian. Mining Top-k Distinguishing Sequential Patterns Using Spark[J]. Journal of Computer Research and Development, 2017, 54(7): 1452-1464. DOI: 10.7544/issn1000-1239.2017.20160553
Citation: Zhang Peng, Duan Lei, Qin Pan, Zuo Jie, Tang Changjie, Yuan Chang’an, Peng Jian. Mining Top-k Distinguishing Sequential Patterns Using Spark[J]. Journal of Computer Research and Development, 2017, 54(7): 1452-1464. DOI: 10.7544/issn1000-1239.2017.20160553

Mining Top-k Distinguishing Sequential Patterns Using Spark

More Information
  • Published Date: June 30, 2017
  • DSP (distinguishing sequential pattern) is a kind of sequence such that it occurs frequently in the sequence set of target class, while infrequently in the sequence set of non-target class. Since distinguishing sequential patterns can describe the differences between two sets of sequences, mining of distinguishing sequential patterns has wide applications, such as building sequence classifiers, characterizing biological features of DNA sequences, and behavior analysis for specified group of people. Compared with mining distinguishing sequential patterns satisfying the predefined support thresholds, mining distinguishing sequential patterns with top-k contrast measure can avoid setting improper support thresholds by users. Thus, it is more user-friendly. However, the conventional algorithm for mining top-k DSPs cannot deal with the sequence data set with large-scale. To break this limitation, a parallel mining method using Spark, named SP-kDSP-Miner (Spark based top-k DSP miner), is designed for mining top-k distinguishing sequential patterns from large-scale sequence data set. Furthermore, in order to improve the efficiency of SP-kDSP-Miner, a novel candidate pattern generation strategy and several pruning strategies, as well as a parallel computing method for the contrast scores of candidate patterns are proposed considering the characteristics of Spark structure. Experiments on both real-world and synthetic data sets demonstrate that SP-kDSP-Miner is effective, efficient and scalable.
  • Related Articles

    [1]Chen Shuping, He Wangquan, Li Yi, Qi Fengbin. Multicast Routing Algorithm for Limited MFT Size in InfiniBand[J]. Journal of Computer Research and Development, 2022, 59(4): 864-881. DOI: 10.7544/issn1000-1239.20200767
    [2]He Ximing, Ma Sheng, Huang Libo, Chen Wei, Wang Zhiying. A Simple and Efficient Cache Coherence Protocol Based on Self-Updating[J]. Journal of Computer Research and Development, 2019, 56(4): 719-729. DOI: 10.7544/issn1000-1239.2019.20170898
    [3]Li Zhe, Li Zhanshan, Li Ying. A Constraint Network Model and Parallel Arc Consistency Algorithms Based on GPU[J]. Journal of Computer Research and Development, 2017, 54(3): 514-528. DOI: 10.7544/issn1000-1239.2017.20150912
    [4]Han Xiaowei, Wu Liji, Wang Beibei, Wang An. Atomic Algorithm Against Simple Power Attack of SM2[J]. Journal of Computer Research and Development, 2016, 53(8): 1850-1856. DOI: 10.7544/issn1000-1239.2016.20150052
    [5]Xue Jinrong, An Qiusheng, Zheng Jun. Intent Reduction of Concept Lattice and Database Inference Dependence[J]. Journal of Computer Research and Development, 2014, 51(1): 96-103.
    [6]Wang Xizhao, Wang Tingting, and Zhai Junhai. An Attribute Reduction Algorithm Based on Instance Selection[J]. Journal of Computer Research and Development, 2012, 49(11): 2305-2310.
    [7]Zhang Nan, Miao Duoqian, Yue Xiaodong. Approaches to Knowledge Reduction in Interval-Valued Information Systems[J]. Journal of Computer Research and Development, 2010, 47(8): 1362-1371.
    [8]Shi Zhibin and Huang Houkuan. Reductive Data Cube Based on Formal Concept Analysis[J]. Journal of Computer Research and Development, 2009, 46(11): 1956-1962.
    [9]Hao Zhongxiao, Li Yanjuan. Research on Decomposition Problem of Temporal Elementary Key Normal form and Temporal Simple Normal Form in Temporal Database with Multiple Time Granularities[J]. Journal of Computer Research and Development, 2005, 42(9): 1485-1492.
    [10]Shang Lin, Wan Qiong, Yao Wangshu, Wang Jingen, Chen Shifu. An Approach for Reduction of Continuous-Valued Attributes[J]. Journal of Computer Research and Development, 2005, 42(7): 1217-1224.
  • Cited by

    Periodical cited type(0)

    Other cited types(2)

Catalog

    Article views (2130) PDF downloads (1004) Cited by(2)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return