• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
高级检索

基于空间位置关系的轨迹数据高效降维和查询算法

巢成, 蒲非凡, 许建秋, 高云君

巢成, 蒲非凡, 许建秋, 高云君. 基于空间位置关系的轨迹数据高效降维和查询算法[J]. 计算机研究与发展, 2024, 61(7): 1771-1790. DOI: 10.7544/issn1000-1239.202330609
引用本文: 巢成, 蒲非凡, 许建秋, 高云君. 基于空间位置关系的轨迹数据高效降维和查询算法[J]. 计算机研究与发展, 2024, 61(7): 1771-1790. DOI: 10.7544/issn1000-1239.202330609
Chao Cheng, Pu Feifan, Xu Jianqiu, Gao Yunjun. Efficient Dimensionality Reduction and Query Algorithm of Trajectory Data Based on Spatial Position Relation[J]. Journal of Computer Research and Development, 2024, 61(7): 1771-1790. DOI: 10.7544/issn1000-1239.202330609
Citation: Chao Cheng, Pu Feifan, Xu Jianqiu, Gao Yunjun. Efficient Dimensionality Reduction and Query Algorithm of Trajectory Data Based on Spatial Position Relation[J]. Journal of Computer Research and Development, 2024, 61(7): 1771-1790. DOI: 10.7544/issn1000-1239.202330609
巢成, 蒲非凡, 许建秋, 高云君. 基于空间位置关系的轨迹数据高效降维和查询算法[J]. 计算机研究与发展, 2024, 61(7): 1771-1790. CSTR: 32373.14.issn1000-1239.202330609
引用本文: 巢成, 蒲非凡, 许建秋, 高云君. 基于空间位置关系的轨迹数据高效降维和查询算法[J]. 计算机研究与发展, 2024, 61(7): 1771-1790. CSTR: 32373.14.issn1000-1239.202330609
Chao Cheng, Pu Feifan, Xu Jianqiu, Gao Yunjun. Efficient Dimensionality Reduction and Query Algorithm of Trajectory Data Based on Spatial Position Relation[J]. Journal of Computer Research and Development, 2024, 61(7): 1771-1790. CSTR: 32373.14.issn1000-1239.202330609
Citation: Chao Cheng, Pu Feifan, Xu Jianqiu, Gao Yunjun. Efficient Dimensionality Reduction and Query Algorithm of Trajectory Data Based on Spatial Position Relation[J]. Journal of Computer Research and Development, 2024, 61(7): 1771-1790. CSTR: 32373.14.issn1000-1239.202330609

基于空间位置关系的轨迹数据高效降维和查询算法

基金项目: 国家自然科学基金项目(U23A20296)
详细信息
    作者简介:

    巢成: 1999年生. 硕士. 主要研究方向为移动对象数据库

    蒲非凡: 1999年生. 硕士研究生. 主要研究方向为时序数据管理

    许建秋: 1982年生. 博士,教授,博士生导师. CCF高级会员. 主要研究方向为数据软件、空间数据库、移动对象数据库、可扩充数据库

    高云君: 1977年生. 博士,教授,博士生导师. CCF高级会员. 主要研究方向为数据库、大数据管理与分析、DB与AI融合

    通讯作者:

    许建秋(jianqiu@nuaa.edu.cn

  • 中图分类号: TP391

Efficient Dimensionality Reduction and Query Algorithm of Trajectory Data Based on Spatial Position Relation

Funds: This work was supported by the National Natural Science Foundation of China (U23A20296).
More Information
    Author Bio:

    Chao Cheng: born in 1999. Master. Her main research interest includes moving objects databases

    Pu Feifan: born in 1999. Master candidate. His main research interest includes time-series data management. (pff1999@nuaa.edu.cn

    Xu Jianqiu: born in 1982. PhD, professor, PhD supervisor. Senior member of CCF. His main research interests include data software, spatial databases, moving object databases, and extensible databases

    Gao Yunjun: born in 1977. PhD, professor, PhD supervisor. Senior member of CCF. His main research interests include database, big data management and analytics, and DB4AI and AI4DB. (gaoyj@zju.edu.cn

  • 摘要:

    由于新型信息技术的快速发展,社会处于数字化、信息化转型的关键时期,各行业对于以数据库技术为基础的信息系统的需求也日益凸显. 基于位置的服务依赖于海量实时生成的轨迹数据,在处理亿万级随时间连续变化的轨迹数据时,降维算法和查询技术一直是研究的关键,通过降低轨迹数据的规模,减少查询操作时处理数据的时间,能有效提升查询的性能,而能否实现高质量、高效率查询对于数据库而言至关重要. 提出了面向轨迹数据的均匀网格编码,并在进一步优化后提出非均匀网格降维算法,将轨迹数据的坐标转化为1维字符串存储,对不符合要求的网格进行合并处理;通过空间位置映射充分保留轨迹数据间复杂的相互关系,并采用范围查询与最近邻查询对降维后的数据进行性能测试. 实验使用不同城市真实轨迹数据与模拟生成轨迹数据作为数据集,将提出的均匀网格算法、非均匀网格算法与3种基准方法进行对比. 实验证明,优化后的非均匀网格算法降维后数据的空间位置关系相似度可高达82.50%,范围查询时间较其他查询时间提升了至少73.86%,最近邻查询时间提升了至少52.26%,与其他基准方法相比取得了更好的效果.

    Abstract:

    Due to the rapid development of information technology, society is in a critical period of digitalization and information transformation, and the demand for information systems based on database technology in various industries is becoming increasingly prominent. Location-based services rely on massive real-time generated trajectory data. In the processing of hundreds of millions of continuously changing trajectory data, dimensionality reduction algorithm and query technology have been the key to research. By reducing the scale of trajectory data and reducing the time of data processing during query operations, the performance of query can be effectively improved, and whether high-quality and efficient query can be achieved is very important for the database. In this paper, a UGC(uniform grid code) and a NGDR(non-uniform grid dimensionality reduction algorithm) for trajectory data are proposed, which convert the coordinates of trajectory data into one-dimensional string storage, merge the grids that do not meet the requirements, fully retain the complex interrelationship between trajectory data through spatial position mapping, and use range query and nearest neighbor query to test the performance of the reduced data. The real trajectory and virtual generated trajectory data in different cities are used as datasets, and the uniform grid code algorithm, non-uniform grid algorithm proposed in this paper are compared with three benchmark methods. Experiments show that the spatial position relationship similarity of the data after NGDR can be up to 82.5%. The range query time of NGDR is improved at least by 73.86% compared with the other queries, and the nearest neighbour query time is improved at least by 52.26%, which achieves better results than other benchmark methods.

  • 图  1   50个对象时空分布图

    Figure  1.   Schematic diagram of spatial-temporal distribution for 50 objects

    图  2   均匀网格编码示意图

    Figure  2.   Schematic diagram of uniform grid encoding

    图  3   轨迹数据均匀网格分布示意图

    Figure  3.   Schematic diagram of the uniform grid distribution of trajectory data

    图  4   轨迹数据非均匀网格分布示意图

    Figure  4.   Schematic diagram of the non-uniform grid distribution of trajectory data

    图  5   范围查询操作工作流

    Figure  5.   Workflow of the range query operation

    图  6   最近邻查询示意图

    Figure  6.   Illustration of the nearest neighbor query

    图  7   不同编码长度的空间位置相似度

    Figure  7.   Spatial position similarity under different coding lengths

    图  8   测试数据集前缀分组图

    Figure  8.   Prefix grouping diagram of test data set

    图  9   不同降维算法的空间位置相似度对比

    Figure  9.   Comparison of spatial position similarity of different dimensionality reduction algorithms

    图  10   深圳市数据集降维时间对比图

    Figure  10.   Comparison of dimensionality reduction time of Shenzhen dataset

    图  11   不同降维算法的初始性能对比

    Figure  11.   Comparison of the initial performance of different dimensionality reduction algorithms

    图  12   不同时间区间对性能的影响

    Figure  12.   Impact of different time intervals on performance

    图  13   优化后不同降维算法的性能对比图

    Figure  13.   Performance comparison of different dimensionality reduction algorithms after optimization

    图  14   不同区域窗口范围的查询时间对比

    Figure  14.   Query time comparison of different regional windows range

    图  15   不同时间窗口范围的查询时间对比

    Figure  15.   Query time comparison of different time windows range

    图  16   最近邻查询时间对比

    Figure  16.   Comparison of nearest neighbor query time

    表  1   均匀网格符号定义

    Table  1   Definition of Uniform Grid Symbols

    符号说明
    M轨迹数据总数
    Ul单位网格长度
    Uw单位网格宽度
    Num单位网格总数
    PNi网格i中轨迹数据点的个数
    下载: 导出CSV

    表  2   轨迹数据集属性表

    Table  2   Table of Attributes of Trajectory Data Set

    城市 时间范围 轨迹条数 轨迹点数 采样频率/s
    北京 2008年2月2日15时至8日15时 3203 5222752 300
    深圳 2013年10月22日0时至24时 302 917006 30
    柏林 2003年11月20日6时至9时 562 51544 120
    下载: 导出CSV
  • [1]

    Koldewey C, Rabe M, Dumitrescu R, et al. Introduction to the minitrack on data-driven services in manufacturing: Innovation, engineering, transformation, and management[C/OL]// Proc of the 56th Hawaii Int Conf on System Science. Hawaii: HICSS, 2023[2023-01-03].https://scholarspace.manoa.hawaii.edu/items/02775c54-ee1a-44b4-aafe-78bac2b11fc5

    [2]

    Duentgen C , Behr T , Gueting R H . BerlinMOD: A benchmark for moving object databases[J]. Very Large Data Bases Journal, 2009, 18(6): 1335−1368

    [3] 吴万青,赵永新,王巧,等. 一种满足差分隐私的轨迹数据安全存储和发布方法[J]. 计算机研究与发展,2021,58(11):2430−2443 doi: 10.7544/issn1000-1239.2021.20210589

    Wu Wanqing, Zhao Yongxin, Wang Qiao, et al. A safe storage and release method of trajectory data satisfying differential privacy[J]. Journal of Computer Research and Development, 2021, 58(11): 2430−2443 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210589

    [4]

    Li Jiangneng, Wang Zheng, Cong Gao, et al. Towards designing and learning piecewise space-filling curves [C]// Proc of the 49th Int Conf on Very Large Data Bases. New York: ACM, 2023: 2158−2171

    [5]

    Fujiwara T, Kuo Y H, Ynnerman A, et al. Feature learning for nonlinear dimensionality reduction toward maximal extraction of hidden patterns[C]// Proc of the 16th Pacific Visualization Symp (PacificVis). Piscataway, NJ: IEEE, 2023: 122−131

    [6]

    Roweis S, Saul L. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323−2326

    [7]

    Berchtold S, Christian B, Kriegel H P. The pyramid-technique: Towards breaking the curse of dimensionality[C]// Proc of the 1998 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 1998: 142−153

    [8]

    Sundararajan R R. Principal component analysis using frequency components of multivariate time series[J]. Computational Statistics & Data Analysis, 2021, 157: 107164

    [9]

    Zhu Xiaofeng, Zhang Shichao, Jin Zhi, et al. Missing value estimation for mixed-attribute data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 23(1): 110−121

    [10] 王靖. 流形学习的理论与方法研究[D]. 杭州:浙江大学,2006

    Wang Jing. Research on manifold learning: Theories and approaches [D]. Hangzhou: Zhejiang University, 2006 (in Chinese)

    [11]

    Wang Jiaming, Shao Zhenfeng, Huang Xiao, et al. Deep locally linear embedding network[J]. Information Sciences, 2022, 614: 416−431 doi: 10.1016/j.ins.2022.10.036

    [12]

    Arena P, Patanè L, Spinosa A G. Robust modelling of binary decisions in laplacian eigenmaps-based echo state networks[J]. Engineering Applications of Artificial Intelligence, 2020, 95: 103828 doi: 10.1016/j.engappai.2020.103828

    [13]

    Zhang Peng, Qiao Hong, Zhang Bo. An improved local tangent space alignment method for manifold learning[J]. Pattern Recognition Letters, 2011, 32(2): 181−189 doi: 10.1016/j.patrec.2010.10.005

    [14]

    Kruskal, Joseph B. Nonmetric multidimensional scaling: A numerical method[J]. Psychometrika, 1964, 29(2): 115−129 doi: 10.1007/BF02289694

    [15]

    Hamm K, Henscheid N, Kang Shujie. Wassmap: Wasserstein isometric mapping for image manifold learning[J]. SIAM Journal on Mathematics of Data Science, 2023, 5(2): 475−501 doi: 10.1137/22M1490053

    [16]

    Jeng Yihnen, Lin Sanyih, Lee Zongshiaw. Smoothing of the multiple one-dimensional adaptive grid procedure[J]. AIAA Journal, 2000, 38(12): 2353−2355 doi: 10.2514/2.905

    [17]

    Verma J P V, Mankad S H, Garg S. GeoHash tag based mobility detection and prediction for traffic management[J]. SN Applied Sciences, 2020, 2: 1−13 doi: 10.1007/s42452-019-1685-8

    [18]

    Schler M, Grebhahn A, Schrter R, et al. QuEval: Beyond high-dimensional indexing à la carte[J]. VLDB Endowment, 2013, 6(14): 1654−1665 doi: 10.14778/2556549.2556551

    [19]

    Asano T, Ranjan D, Roos T, et al. Space-filling curves and their use in the design of geometric data structures[J]. Theoretical Computer Science, 1997, 181(1): 3−15 doi: 10.1016/S0304-3975(96)00259-9

    [20]

    Morton G M. A computer oriented geodetic data base and a new technique in file sequencing [J/OL]. Physics of Plasmas, 1966[2023-03-09].https://www.researchgate.net/publication/242678487_A_Computer_Oriented_Geodetic_Data_Base_and_a_New_Technique_in_File_Sequencing

    [21]

    Wu Yuhao, Cao Xuefeng, An Zipeng. A spatiotemporal trajectory data index based on the Hilbert curve code[J]. IOP Conf Series Earth and Environmental Science, 2020, 45(9): 1403−1411

    [22]

    Huang Jian, Zhao Jinling, Guo Yixiao, et al. The application on distributed geospatial data management based on Hadoop and the application in WebGIS[C/OL]// Proc of the 9th Int Conf on Agro-Geoinformatics (Agro-Geoinformatics). Piscataway, NJ: IEEE, 2021 [2023-03-12].https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9530350

    [23]

    Lei Yi, Tong Xiaochong, Wang Dali, et al. W-Hilbert: A W-shaped Hilbert curve and coding method for multiscale geospatial data index[J]. International Journal of Applied Earth Observation and Geoinformation, 2023, 118: 103298 doi: 10.1016/j.jag.2023.103298

    [24]

    Lin Xuelian, Ma Shuai, Jiang Jiahao, et al. Error bounded line simplification algorithms for trajectory compression: An experimental evaluation[J]. ACM Transactions on Database Systems, 2021, 46(3): 1−44

    [25]

    Zheng Kai, Zhao Yan, Lian Defu, et al. Reference-based framework for spatio-temporal trajectory compression and query processing[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 32(11): 2227−2240

    [26]

    Fikioris G, Patroumpas K, Artikis A, et al. Optimizing vessel trajectory compression for maritime situational awareness[J]. GeoInformatica, 2023, 27(3): 565−591 doi: 10.1007/s10707-022-00475-0

    [27]

    Kulkarni T. Answering range queries under local differential privacy[C]//Proc of the 2019 Int Conf on Management of Data. New York: ACM, 2019: 1832−1834

    [28]

    Ghane S, Kulik L, Ramamoharao K. A differentially private algorithm for range queries on trajectories[J]. Knowledge and Information Systems, 2021, 63(2): 277−303 doi: 10.1007/s10115-020-01520-w

    [29]

    Li Song, Li Bohan, Yu Jiaxi, et al. Probabilistic threshold K-ann query method based on uncertain Voronoi diagram in Internet of vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(6): 3592−3602

    [30]

    Wang Beiyi, Zhang Xiaohong, Wang Weibing. Feature matching method based on SURF and fast library for approximate nearest neighbor search[J]. Integrated Ferroelectrics, 2021, 218(1): 147−154 doi: 10.1080/10584587.2021.1911336

    [31]

    Zhao Peng, Rao Weixiong, Zhang Chengxi, et al. SST: Synchronized spatial-temporal trajectory similarity search[J]. GeoInformatica, 2020, 24: 777−800 doi: 10.1007/s10707-020-00405-y

    [32]

    Zheng Bolong, Weng Lianggui, Zhao Xi, et al. Repose: Distributed top-k trajectory similarity search with local reference point tries[C]// Proc of the 37th IEEE Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2021: 708−719

    [33]

    Mokbel M F, Aref W G, Kamel I. Analysis of multi-dimensional space-filling curves[J]. GeoInformatica, 2003, 7(3): 179−209 doi: 10.1023/A:1025196714293

    [34]

    Liu Jian, Liu Gaofeng. Algorithm of coordinates conversion in Gauss-Kruger projection [J/OL]. Computer Simulation, 2005, 10[2022-12-20]. http://en.cnki.com.cn/Article_en/CJFDTOTAL-JSJZ200510031.htm

  • 期刊类型引用(6)

    1. 徐雪峰,郭广伟,黄余. 改进全卷积神经网络的遥感图像小目标检测. 机械设计与制造. 2024(10): 38-42 . 百度学术
    2. 刘雯雯,汪皖燕,程树林. 融合项目热门惩罚因子改进协同过滤推荐方法. 计算机技术与发展. 2023(03): 15-19 . 百度学术
    3. 冯勇,刘洋,王嵘冰,徐红艳,张永刚. 面向用户需求的生成对抗网络多样性推荐方法. 小型微型计算机系统. 2023(06): 1192-1197 . 百度学术
    4. 冯晨娇,宋鹏,张凯涵,梁吉业. 融合社交网络信息的长尾推荐方法. 模式识别与人工智能. 2022(01): 26-36 . 百度学术
    5. 韩迪,陈怡君,廖凯,林坤玲. 推荐系统中的准确性、新颖性和多样性的有效耦合与应用. 南京大学学报(自然科学). 2022(04): 604-614 . 百度学术
    6. 甘亚男,耿生玲,郝立. 超贝叶斯图模型及其联结树的构建. 青海师范大学学报(自然科学版). 2021(02): 42-48 . 百度学术

    其他类型引用(8)

图(16)  /  表(2)
计量
  • 文章访问数:  145
  • HTML全文浏览量:  62
  • PDF下载量:  57
  • 被引次数: 14
出版历程
  • 收稿日期:  2023-07-25
  • 修回日期:  2023-11-30
  • 网络出版日期:  2024-03-03
  • 刊出日期:  2024-06-30

目录

    /

    返回文章
    返回