ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (2): 264-281.doi: 10.7544/issn1000-1239.20210913

所属专题: 2022空间数据智能专题

• 人工智能 • 上一篇    下一篇

基于空间占有度的主导并置模式挖掘

方圆1,2,王丽珍3,王晓璇3,杨培忠3   

  1. 1(云南大学数学与统计学院 昆明 650500);2(云南大学西南天文研究所 昆明 650500);3(云南大学信息学院 昆明 650500) (fangyuan@ynu.edu.cn)
  • 出版日期: 2022-02-01
  • 基金资助: 
    国家自然科学基金项目(61966036,61662086);云南省创新团队基金项目(2018HC019);云南大学博士后基金项目(C176220200)

Spatial Occupancy-Based Dominant Co-Location Patterns Mining

Fang Yuan1,2, Wang Lizhen3, Wang Xiaoxuan3, Yang Peizhong3   

  1. 1(School of Mathematics and Statistics, Yunnan University, Kunming 650500);2(South-Western Institute for Astronomy Research, Yunnan University, Kunming 650500);3(School of Information Science and Engineering, Yunnan University, Kunming 650500)
  • Online: 2022-02-01
  • Supported by: 
    This work was supported by the National Natural Science Foundation of China (61966036, 61662086), the Project of Innovative Team of Yunnan Province (2018HC019), and the Post Doctor Foundation of Yunnan University (C176220200).

摘要: 传统的空间并置模式挖掘旨在发现空间中实例频繁共存的特征子集.目前空间并置模式的大多数研究都将模式的频繁性作为兴趣度度量.然而,在实际应用场景中,用户往往不仅对特征集的频繁性感兴趣,而且对它的完整性也感兴趣.结合并置模式的频繁性和完整性,提出主导空间并置模式(dominant spatial co-location patterns, DSCPs),目的是为用户提供一组高质量的并置模式.具体地,在空间并置模式挖掘任务中引入了模式占有度,以衡量并置模式的完整性.我们通过同时考虑模式的完整性和频繁性形式化了主导并置模式挖掘的问题.设计了一个挖掘主导并置模式的基本算法,为了降低计算开销,提出了一系列的剪枝策略及新颖的数据结构改进基本算法的挖掘效率.在合成数据集和真实数据集上进行了实验,评估了所提出算法的效率和有效性,验证了剪枝策略能够大幅提高算法效率.在实际应用中的挖掘结果表明了主导并置模式挖掘的合理性和可用性.

关键词: 空间数据挖掘, 主导并置模式, 占有度度量 , 频繁性度量, 空间关联规则

Abstract: Traditional spatial co-location pattern mining aims to discover the subset of spatial feature set whose instances are prevalently located together in geographic neighborhoods. Most previous studies take the prevalence of patterns as an interestingness measure. However, It may well be that users are not only interested in identifying the prevalence of a feature set, but also its completeness, namely the portion of co-location instances that a pattern occupies in their neighborhood. Combining the prevalence and completeness of co-location patterns, we can provide users with a set of higher quality co-location patterns called dominant spatial co-location patterns (DSCPs). In this paper, we introduce an occupancy measure into the spatial co-location pattern mining task to measure the completeness of co-location patterns. Then we formulate the problem of DSCPs mining by considering both the completeness and prevalence. Thirdly, we present a basic algorithm for discovering DSCPs. In order to reduce the high computational cost, a series of pruning strategies are given to improve the algorithm efficiency. Finally, the experiments are conducted both on synthetic and real-world data sets, and the efficiency and effectiveness of the proposed algorithms are evaluated. The running time on synthetic data sets shows our pruning strategies are efficient. The mining results in two real-world applications demonstrate that DSCPs are reasonable and acceptable.

Key words: spatial data mining, dominant spatial co-location patterns (DSCPs), occupancy metrics, prevalence metrics, spatial association rules

中图分类号: