DMFUCP: A Distributed Mining Framework for Universal Companion Patterns on Large-Scale Trajectory Data
-
摘要: 广泛应用的移动定位设备方便了用户位置数据的获取,轨迹数据量高速增长.通用伴随模式挖掘聚焦时空维度上的用户高相似度行为路径发现问题,基于大规模轨迹数据设计高效准确地伴随模式挖掘方法对发现用户偏好、构建新商业模式等具有重要意义,同时也极具挑战.一方面,海量且不断增长的轨迹数据要求伴随模式挖掘应具有良好的可扩展性,集中性挖掘策略并不适用.另一方面,现有的分布式挖掘框架在为高效模式挖掘提供高质量数据输入、轨迹数据中大量松散连接的有效处理等方面考虑不足,使得通用伴随模式发现存在改进空间.提出了一个分布式的2阶段通用伴随模式挖掘框架——DMFUCP,其通过嵌入数据预处理优化、松散连接分析优化等,让伴随模式挖掘方法呈现了更好的性能.其中,该框架为数据预处理阶段设了融合运动方向的密度聚类算法DBSCANCD和聚类平衡算法TCB,确保后续挖掘任务获得提供少噪音、高质量的轨迹数据输入;在模式挖掘阶段,该框架设计了G剪枝重划分算法GSPR和分段枚举算法SAE,GSPR使用参数G对长轨迹进行分割,并将分割后的所有分段重划分以改善松散连接的处理效果,SAE负责引入多线程和前向闭包保证挖掘算法的性能.实验证明,相比现有的通用伴随模式挖掘框架,DMFUCP具有更好的通用伴随模式发现能力的同时,将挖掘每组通用伴随模式的时间消耗降低了20%~40%.Abstract: The popularity of mobile positioning terminals makes users’ locations be easily accessible, which contributes huge amount of trajectory data. Universal companion pattern mining aims at discovering those highly overlapping behavior paths between moving objects in spatio-temporal dimensions, and it is very valuable and challenging to provide effective and efficient pattern mining methods on large-scale trajectories. Obviously, the mining strategy on a centralized environment is incompetent for the consideration of scalability caused by huge and growing trajectory data. Existing distributed mining frameworks are weak in both providing effective input for efficient pattern mining and the processing ability on a large number of loose connections in massive trajectories, which should be covered to improve mining performance. In this study, we propose a distributed two-stage mining framework, DMFUCP, which embeds optimization on data preprocessing and loose connection analysis to provide more efficient and effective universal companion pattern mining. In the data preprocessing stage of DMFUCP, we design both a density clustering algorithm DBSCANCD and a clustering balance algorithm TCB to input high-quality trajectory data with less noisy for mining tasks. In the mining stage of DMFUCP, we propose both a G pruning repartition algorithm GSPR and a segmented enumeration algorithm SAE. GSPR introduces a parameter G to segment long trajectories and then repartitions all segments to improve the processing effectiveness on loose connections. SAE guarantees the mining performance through multithreading and forward closure. Compared with those existing companion pattern mining frameworks on real datasets, DMFUCP reduces the time required to mine each set of universal companion pattern by 20% to 40% while providing better universal companion pattern discovery capabilities.
-
-
期刊类型引用(7)
1. 张佩瑶,付晓东. 防恶意竞价的众包多任务分配激励机制. 计算机应用. 2024(01): 261-268 . 百度学术
2. 刘俊岭,高新宇,孙焕良,许景科. 空间众包中隔离敏感的任务匹配算法. 计算机工程与应用. 2024(17): 252-262 . 百度学术
3. 邓清勇,左清华,李哲涛,王恩,郭斌. 基于区块链的群智感知双向信誉评估隐私保护. 计算机研究与发展. 2024(11): 2681-2692 . 本站查看
4. 黄黎,赵璐,陈嘉豪. 基于能力层次聚类和角色协同的众包任务分配. 计算机工程与设计. 2024(12): 3739-3748 . 百度学术
5. 周静,董红斌,郭田雨. 基于遗传算法的时空众包3类对象任务分配. 应用科技. 2023(06): 7-20 . 百度学术
6. 王珂. 物流货品转运设备集成单元控制技术与应用研究. 中国储运. 2022(07): 195-196 . 百度学术
7. 程维杰,李洪贵,范勇强,彭钰寒,甘戈. 时空众包技术综述. 无线电工程. 2022(08): 1456-1465 . 百度学术
其他类型引用(15)
计量
- 文章访问数:
- HTML全文浏览量: 0
- PDF下载量:
- 被引次数: 22