ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2022, Vol. 59 ›› Issue (3): 647-660.doi: 10.7544/issn1000-1239.20200648

Previous Articles     Next Articles

DMFUCP: A Distributed Mining Framework for Universal Companion Patterns on Large-Scale Trajectory Data

Zhang Jingwei1, Liu Shaojian1, Yang Qing2, Zhou Ya1   

  1. 1(Guangxi Key Laboratory of Trusted Software (Guilin University of Electronic Technology), Guilin, Guangxi 541004);2(Guangxi Key Laboratory of Automatic Detecting Technology and Instrument (Guilin University of Electronic Technology), Guilin, Guangxi 541004)
  • Online:2022-03-07
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61862013, 61662015, U1811264, U1711263), the Natural Science Foundation of Guangxi Aotonomous Region of China (2020GXNSFAA159117, 2018GXNSFAA281199, 2017GXNSFAA198035), the Key Project of Guangxi Key Laboratory of Trusted Software (KX202052), and the Foundation of Guangxi Key Laboratory of Automatic Detection Technology and Instrument (YQ19109).

Abstract: The popularity of mobile positioning terminals makes users’ locations be easily accessible, which contributes huge amount of trajectory data. Universal companion pattern mining aims at discovering those highly overlapping behavior paths between moving objects in spatio-temporal dimensions, and it is very valuable and challenging to provide effective and efficient pattern mining methods on large-scale trajectories. Obviously, the mining strategy on a centralized environment is incompetent for the consideration of scalability caused by huge and growing trajectory data. Existing distributed mining frameworks are weak in both providing effective input for efficient pattern mining and the processing ability on a large number of loose connections in massive trajectories, which should be covered to improve mining performance. In this study, we propose a distributed two-stage mining framework, DMFUCP, which embeds optimization on data preprocessing and loose connection analysis to provide more efficient and effective universal companion pattern mining. In the data preprocessing stage of DMFUCP, we design both a density clustering algorithm DBSCANCD and a clustering balance algorithm TCB to input high-quality trajectory data with less noisy for mining tasks. In the mining stage of DMFUCP, we propose both a G pruning repartition algorithm GSPR and a segmented enumeration algorithm SAE. GSPR introduces a parameter G to segment long trajectories and then repartitions all segments to improve the processing effectiveness on loose connections. SAE guarantees the mining performance through multithreading and forward closure. Compared with those existing companion pattern mining frameworks on real datasets, DMFUCP reduces the time required to mine each set of universal companion pattern by 20% to 40% while providing better universal companion pattern discovery capabilities.

Key words: distributed mining framework, loose connections, clustering balance, G pruning repartition, segmented enumeration

CLC Number: