高级检索
    方圆, 王丽珍, 王晓璇, 杨培忠. 基于空间占有度的主导并置模式挖掘[J]. 计算机研究与发展, 2022, 59(2): 264-281. DOI: 10.7544/issn1000-1239.20210913
    引用本文: 方圆, 王丽珍, 王晓璇, 杨培忠. 基于空间占有度的主导并置模式挖掘[J]. 计算机研究与发展, 2022, 59(2): 264-281. DOI: 10.7544/issn1000-1239.20210913
    Fang Yuan, Wang Lizhen, Wang Xiaoxuan, Yang Peizhong. Spatial Occupancy-Based Dominant Co-Location Patterns Mining[J]. Journal of Computer Research and Development, 2022, 59(2): 264-281. DOI: 10.7544/issn1000-1239.20210913
    Citation: Fang Yuan, Wang Lizhen, Wang Xiaoxuan, Yang Peizhong. Spatial Occupancy-Based Dominant Co-Location Patterns Mining[J]. Journal of Computer Research and Development, 2022, 59(2): 264-281. DOI: 10.7544/issn1000-1239.20210913

    基于空间占有度的主导并置模式挖掘

    Spatial Occupancy-Based Dominant Co-Location Patterns Mining

    • 摘要: 传统的空间并置模式挖掘旨在发现空间中实例频繁共存的特征子集.目前空间并置模式的大多数研究都将模式的频繁性作为兴趣度度量.然而,在实际应用场景中,用户往往不仅对特征集的频繁性感兴趣,而且对它的完整性也感兴趣.结合并置模式的频繁性和完整性,提出主导空间并置模式(dominant spatial co-location patterns, DSCPs),目的是为用户提供一组高质量的并置模式.具体地,在空间并置模式挖掘任务中引入了模式占有度,以衡量并置模式的完整性.我们通过同时考虑模式的完整性和频繁性形式化了主导并置模式挖掘的问题.设计了一个挖掘主导并置模式的基本算法,为了降低计算开销,提出了一系列的剪枝策略及新颖的数据结构改进基本算法的挖掘效率.在合成数据集和真实数据集上进行了实验,评估了所提出算法的效率和有效性,验证了剪枝策略能够大幅提高算法效率.在实际应用中的挖掘结果表明了主导并置模式挖掘的合理性和可用性.

       

      Abstract: Traditional spatial co-location pattern mining aims to discover the subset of spatial feature set whose instances are prevalently located together in geographic neighborhoods. Most previous studies take the prevalence of patterns as an interestingness measure. However, It may well be that users are not only interested in identifying the prevalence of a feature set, but also its completeness, namely the portion of co-location instances that a pattern occupies in their neighborhood. Combining the prevalence and completeness of co-location patterns, we can provide users with a set of higher quality co-location patterns called dominant spatial co-location patterns (DSCPs). In this paper, we introduce an occupancy measure into the spatial co-location pattern mining task to measure the completeness of co-location patterns. Then we formulate the problem of DSCPs mining by considering both the completeness and prevalence. Thirdly, we present a basic algorithm for discovering DSCPs. In order to reduce the high computational cost, a series of pruning strategies are given to improve the algorithm efficiency. Finally, the experiments are conducted both on synthetic and real-world data sets, and the efficiency and effectiveness of the proposed algorithms are evaluated. The running time on synthetic data sets shows our pruning strategies are efficient. The mining results in two real-world applications demonstrate that DSCPs are reasonable and acceptable.

       

    /

    返回文章
    返回