Efficient Dimensionality Reduction and Query Algorithm of Trajectory Data Based on Spatial Position Relation
-
摘要:
由于新型信息技术的快速发展,社会处于数字化、信息化转型的关键时期,各行业对于以数据库技术为基础的信息系统的需求也日益凸显. 基于位置的服务依赖于海量实时生成的轨迹数据,在处理亿万级随时间连续变化的轨迹数据时,降维算法和查询技术一直是研究的关键,通过降低轨迹数据的规模,减少查询操作时处理数据的时间,能有效提升查询的性能,而能否实现高质量、高效率查询对于数据库而言至关重要. 提出了面向轨迹数据的均匀网格编码,并在进一步优化后提出非均匀网格降维算法,将轨迹数据的坐标转化为1维字符串存储,对不符合要求的网格进行合并处理;通过空间位置映射充分保留轨迹数据间复杂的相互关系,并采用范围查询与最近邻查询对降维后的数据进行性能测试. 实验使用不同城市真实轨迹数据与模拟生成轨迹数据作为数据集,将提出的均匀网格算法、非均匀网格算法与3种基准方法进行对比. 实验证明,优化后的非均匀网格算法降维后数据的空间位置关系相似度可高达82.50%,范围查询时间较其他查询时间提升了至少73.86%,最近邻查询时间提升了至少52.26%,与其他基准方法相比取得了更好的效果.
Abstract:Due to the rapid development of information technology, society is in a critical period of digitalization and information transformation, and the demand for information systems based on database technology in various industries is becoming increasingly prominent. Location-based services rely on massive real-time generated trajectory data. In the processing of hundreds of millions of continuously changing trajectory data, dimensionality reduction algorithm and query technology have been the key to research. By reducing the scale of trajectory data and reducing the time of data processing during query operations, the performance of query can be effectively improved, and whether high-quality and efficient query can be achieved is very important for the database. In this paper, a UGC(uniform grid code) and a NGDR(non-uniform grid dimensionality reduction algorithm) for trajectory data are proposed, which convert the coordinates of trajectory data into one-dimensional string storage, merge the grids that do not meet the requirements, fully retain the complex interrelationship between trajectory data through spatial position mapping, and use range query and nearest neighbor query to test the performance of the reduced data. The real trajectory and virtual generated trajectory data in different cities are used as datasets, and the uniform grid code algorithm, non-uniform grid algorithm proposed in this paper are compared with three benchmark methods. Experiments show that the spatial position relationship similarity of the data after NGDR can be up to 82.5%. The range query time of NGDR is improved at least by 73.86% compared with the other queries, and the nearest neighbour query time is improved at least by 52.26%, which achieves better results than other benchmark methods.
-
-
表 1 均匀网格符号定义
Table 1 Definition of Uniform Grid Symbols
符号 说明 M 轨迹数据总数 Ul 单位网格长度 Uw 单位网格宽度 Num 单位网格总数 PNi 网格i中轨迹数据点的个数 表 2 轨迹数据集属性表
Table 2 Table of Attributes of Trajectory Data Set
城市 时间范围 轨迹条数 轨迹点数 采样频率/s 北京 2008年2月2日15时至8日15时 3203 5222752 300 深圳 2013年10月22日0时至24时 302 917006 30 柏林 2003年11月20日6时至9时 562 51544 120 -
[1] Koldewey C, Rabe M, Dumitrescu R, et al. Introduction to the minitrack on data-driven services in manufacturing: Innovation, engineering, transformation, and management[C/OL]// Proc of the 56th Hawaii Int Conf on System Science. Hawaii: HICSS, 2023[2023-01-03].https://scholarspace.manoa.hawaii.edu/items/02775c54-ee1a-44b4-aafe-78bac2b11fc5
[2] Duentgen C , Behr T , Gueting R H . BerlinMOD: A benchmark for moving object databases[J]. Very Large Data Bases Journal, 2009, 18(6): 1335−1368
[3] 吴万青,赵永新,王巧,等. 一种满足差分隐私的轨迹数据安全存储和发布方法[J]. 计算机研究与发展,2021,58(11):2430−2443 doi: 10.7544/issn1000-1239.2021.20210589 Wu Wanqing, Zhao Yongxin, Wang Qiao, et al. A safe storage and release method of trajectory data satisfying differential privacy[J]. Journal of Computer Research and Development, 2021, 58(11): 2430−2443 (in Chinese) doi: 10.7544/issn1000-1239.2021.20210589
[4] Li Jiangneng, Wang Zheng, Cong Gao, et al. Towards designing and learning piecewise space-filling curves [C]// Proc of the 49th Int Conf on Very Large Data Bases. New York: ACM, 2023: 2158−2171
[5] Fujiwara T, Kuo Y H, Ynnerman A, et al. Feature learning for nonlinear dimensionality reduction toward maximal extraction of hidden patterns[C]// Proc of the 16th Pacific Visualization Symp (PacificVis). Piscataway, NJ: IEEE, 2023: 122−131
[6] Roweis S, Saul L. Nonlinear dimensionality reduction by locally linear embedding[J]. Science, 2000, 290(5500): 2323−2326
[7] Berchtold S, Christian B, Kriegel H P. The pyramid-technique: Towards breaking the curse of dimensionality[C]// Proc of the 1998 ACM SIGMOD Int Conf on Management of Data. New York: ACM, 1998: 142−153
[8] Sundararajan R R. Principal component analysis using frequency components of multivariate time series[J]. Computational Statistics & Data Analysis, 2021, 157: 107164
[9] Zhu Xiaofeng, Zhang Shichao, Jin Zhi, et al. Missing value estimation for mixed-attribute data sets[J]. IEEE Transactions on Knowledge and Data Engineering, 2010, 23(1): 110−121
[10] 王靖. 流形学习的理论与方法研究[D]. 杭州:浙江大学,2006 Wang Jing. Research on manifold learning: Theories and approaches [D]. Hangzhou: Zhejiang University, 2006 (in Chinese)
[11] Wang Jiaming, Shao Zhenfeng, Huang Xiao, et al. Deep locally linear embedding network[J]. Information Sciences, 2022, 614: 416−431 doi: 10.1016/j.ins.2022.10.036
[12] Arena P, Patanè L, Spinosa A G. Robust modelling of binary decisions in laplacian eigenmaps-based echo state networks[J]. Engineering Applications of Artificial Intelligence, 2020, 95: 103828 doi: 10.1016/j.engappai.2020.103828
[13] Zhang Peng, Qiao Hong, Zhang Bo. An improved local tangent space alignment method for manifold learning[J]. Pattern Recognition Letters, 2011, 32(2): 181−189 doi: 10.1016/j.patrec.2010.10.005
[14] Kruskal, Joseph B. Nonmetric multidimensional scaling: A numerical method[J]. Psychometrika, 1964, 29(2): 115−129 doi: 10.1007/BF02289694
[15] Hamm K, Henscheid N, Kang Shujie. Wassmap: Wasserstein isometric mapping for image manifold learning[J]. SIAM Journal on Mathematics of Data Science, 2023, 5(2): 475−501 doi: 10.1137/22M1490053
[16] Jeng Yihnen, Lin Sanyih, Lee Zongshiaw. Smoothing of the multiple one-dimensional adaptive grid procedure[J]. AIAA Journal, 2000, 38(12): 2353−2355 doi: 10.2514/2.905
[17] Verma J P V, Mankad S H, Garg S. GeoHash tag based mobility detection and prediction for traffic management[J]. SN Applied Sciences, 2020, 2: 1−13 doi: 10.1007/s42452-019-1685-8
[18] Schler M, Grebhahn A, Schrter R, et al. QuEval: Beyond high-dimensional indexing à la carte[J]. VLDB Endowment, 2013, 6(14): 1654−1665 doi: 10.14778/2556549.2556551
[19] Asano T, Ranjan D, Roos T, et al. Space-filling curves and their use in the design of geometric data structures[J]. Theoretical Computer Science, 1997, 181(1): 3−15 doi: 10.1016/S0304-3975(96)00259-9
[20] Morton G M. A computer oriented geodetic data base and a new technique in file sequencing [J/OL]. Physics of Plasmas, 1966[2023-03-09].https://www.researchgate.net/publication/242678487_A_Computer_Oriented_Geodetic_Data_Base_and_a_New_Technique_in_File_Sequencing
[21] Wu Yuhao, Cao Xuefeng, An Zipeng. A spatiotemporal trajectory data index based on the Hilbert curve code[J]. IOP Conf Series Earth and Environmental Science, 2020, 45(9): 1403−1411
[22] Huang Jian, Zhao Jinling, Guo Yixiao, et al. The application on distributed geospatial data management based on Hadoop and the application in WebGIS[C/OL]// Proc of the 9th Int Conf on Agro-Geoinformatics (Agro-Geoinformatics). Piscataway, NJ: IEEE, 2021 [2023-03-12].https://ieeexplore.ieee.org/stamp/stamp.jsp?tp=&arnumber=9530350
[23] Lei Yi, Tong Xiaochong, Wang Dali, et al. W-Hilbert: A W-shaped Hilbert curve and coding method for multiscale geospatial data index[J]. International Journal of Applied Earth Observation and Geoinformation, 2023, 118: 103298 doi: 10.1016/j.jag.2023.103298
[24] Lin Xuelian, Ma Shuai, Jiang Jiahao, et al. Error bounded line simplification algorithms for trajectory compression: An experimental evaluation[J]. ACM Transactions on Database Systems, 2021, 46(3): 1−44
[25] Zheng Kai, Zhao Yan, Lian Defu, et al. Reference-based framework for spatio-temporal trajectory compression and query processing[J]. IEEE Transactions on Knowledge and Data Engineering, 2019, 32(11): 2227−2240
[26] Fikioris G, Patroumpas K, Artikis A, et al. Optimizing vessel trajectory compression for maritime situational awareness[J]. GeoInformatica, 2023, 27(3): 565−591 doi: 10.1007/s10707-022-00475-0
[27] Kulkarni T. Answering range queries under local differential privacy[C]//Proc of the 2019 Int Conf on Management of Data. New York: ACM, 2019: 1832−1834
[28] Ghane S, Kulik L, Ramamoharao K. A differentially private algorithm for range queries on trajectories[J]. Knowledge and Information Systems, 2021, 63(2): 277−303 doi: 10.1007/s10115-020-01520-w
[29] Li Song, Li Bohan, Yu Jiaxi, et al. Probabilistic threshold K-ann query method based on uncertain Voronoi diagram in Internet of vehicles[J]. IEEE Transactions on Intelligent Transportation Systems, 2020, 22(6): 3592−3602
[30] Wang Beiyi, Zhang Xiaohong, Wang Weibing. Feature matching method based on SURF and fast library for approximate nearest neighbor search[J]. Integrated Ferroelectrics, 2021, 218(1): 147−154 doi: 10.1080/10584587.2021.1911336
[31] Zhao Peng, Rao Weixiong, Zhang Chengxi, et al. SST: Synchronized spatial-temporal trajectory similarity search[J]. GeoInformatica, 2020, 24: 777−800 doi: 10.1007/s10707-020-00405-y
[32] Zheng Bolong, Weng Lianggui, Zhao Xi, et al. Repose: Distributed top-k trajectory similarity search with local reference point tries[C]// Proc of the 37th IEEE Int Conf on Data Engineering (ICDE). Piscataway, NJ: IEEE, 2021: 708−719
[33] Mokbel M F, Aref W G, Kamel I. Analysis of multi-dimensional space-filling curves[J]. GeoInformatica, 2003, 7(3): 179−209 doi: 10.1023/A:1025196714293
[34] Liu Jian, Liu Gaofeng. Algorithm of coordinates conversion in Gauss-Kruger projection [J/OL]. Computer Simulation, 2005, 10[2022-12-20]. http://en.cnki.com.cn/Article_en/CJFDTOTAL-JSJZ200510031.htm
-
期刊类型引用(6)
1. 徐雪峰,郭广伟,黄余. 改进全卷积神经网络的遥感图像小目标检测. 机械设计与制造. 2024(10): 38-42 . 百度学术
2. 刘雯雯,汪皖燕,程树林. 融合项目热门惩罚因子改进协同过滤推荐方法. 计算机技术与发展. 2023(03): 15-19 . 百度学术
3. 冯勇,刘洋,王嵘冰,徐红艳,张永刚. 面向用户需求的生成对抗网络多样性推荐方法. 小型微型计算机系统. 2023(06): 1192-1197 . 百度学术
4. 冯晨娇,宋鹏,张凯涵,梁吉业. 融合社交网络信息的长尾推荐方法. 模式识别与人工智能. 2022(01): 26-36 . 百度学术
5. 韩迪,陈怡君,廖凯,林坤玲. 推荐系统中的准确性、新颖性和多样性的有效耦合与应用. 南京大学学报(自然科学). 2022(04): 604-614 . 百度学术
6. 甘亚男,耿生玲,郝立. 超贝叶斯图模型及其联结树的构建. 青海师范大学学报(自然科学版). 2021(02): 42-48 . 百度学术
其他类型引用(8)