ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (3): 666-676.doi: 10.7544/issn1000-1239.2019.20170750

• 软件技术 • 上一篇    

基于不均匀空间划分和R树的时空索引

赵馨逸1,黄向东1,2,乔嘉林1,康荣1,李娜1,王建民1,2   

  1. 1(清华大学软件学院 北京 100084); 2(工业大数据系统与应用北京市重点实验室 北京 100084) (stefanie_xin@163.com)
  • 出版日期: 2019-03-01
  • 基金资助: 
    国家重点研发计划项目(2016YFB0501504);国家自然科学基金项目(U1509213);中国博士后科学基金项目(2017M620784)

A Spatio-Temporal Index Based on Skew Spatial Coding and R-Tree

Zhao Xinyi1, Huang Xiangdong1,2, Qiao Jialin1, Kang Rong1, Li Na1, Wang Jianmin1,2   

  1. 1(School of Software, Tsinghua University, Beijing 100084); 2(Beijing Key Laboratory for Industrial Bigdata System and Application, Beijing 100084)
  • Online: 2019-03-01

摘要: 随着移动互联网以及物联网的发展,越来越多的移动设备都内置GPS服务,从而产生了大量的时空数据.这些数据体量大、分布不均匀且带有时间和空间经纬度等多维属性.传统的时空索引还有很多问题有待解决,例如难以处理大规模数据、无法同时处理时间和空间维度等.基于Geohash和R-Tree,提出一种2层时空索引GRIST(Geohash and R-Tree based index for spatio-temporal data),第1层是空间索引,它将空间划分为不同大小的网格并使用Geohash进行编码;第2层是时间索引,由R-Tree构成,不同R-Tree索引不同网格里的数据.GRIST索引支持面向时间和面向时空的查询.在大量随机数据和真实Uber数据上的实验表明:GRIST在索引的构建效率上较于GeoMesa和PostGIS系统可以提升10~45倍,在查询效率上可以提升2~4倍.

关键词: 时空数据, 时空索引, 时空范围查询, 轨迹查询, 分布式数据管理

Abstract: With the development of mobile Internet and Internet of Things, more and more mobile devices such as vehicles and mobile phones are embedded with GPS services. These devices generate a lot of spatio-temporal data with multi-dimensional attributes such as time, space longitude and latitude when moving. Traditional spatio-temporal indexes have neither the ability to handle large amounts of data, nor the ability to handle data with both time and space dimensions, nor the ability to support many kinds of spatio-temporal query. This paper illustrates GRIST (Geohash and R-Tree based index for spatio-temporal data), an innovative two-level spatio-temporal index based on Geohash and R-Tree. The first layer is a spatial index based on grid partition, which partitions the space into non-overlapping subspaces with different size. Each grid is encoded by Geohash. The second layer is the time layer index based on the R-Tree, where each spatial grid corresponds to a one-dimensional R-Tree index. Also, this paper supports two types of queries, one is time-based query, the other is spatio-temporal-based query which can be further subdivided into range query and trajectory query. In addition, this paper realizes GRIST by NoSQL K-V database Cassandra and uses Spark to realizes distributed query. Compared with PostGIS and GeoMesa, GRIST has better load and query performance, 10~45 times better in load time and 2~4 times better in query time.

Key words: spatio-temporal data, spatio-temporal index, spatio-temporal range query, trajectory query, distributed data management

中图分类号: