ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2022, Vol. 59 ›› Issue (9): 1869-1886.doi: 10.7544/issn1000-1239.20220012

• 软件技术 • 上一篇    下一篇

一种时序数据模式演化的跟踪与查询方法

赵鑫1,万英格1,刘英博1,2   

  1. 1(清华大学软件学院 北京 100084) (工业大数据系统与应用北京市重点实验室(清华大学) 北京 100084) (zhao-x19@mails.tsinghua.edu.cn)
  • 出版日期: 2022-09-01
  • 基金资助: 
    国家重点研发计划项目(2019YFB1707402);国家自然科学基金项目(62021002);工信部融合应用软件项目(CEIEC-2020-ZM02-0132/06)

Tracking and Querying over Timeseries Data with Schema Evolution

Zhao Xin1, Wan Yingge1, Liu Yingbo1,2   

  1. 1(School of Software, Tsinghua University, Beijing 100084);2(Beijing Key Laboratory of Industrial Bigdata System and Application (Tsinghua University), Beijing 100084)
  • Online: 2022-09-01
  • Supported by: 
    This work was supported by the National Key Research and Development Program of China (2019YFB1707402), the National Natural Science Foundation of China (62021002), and the Fusion Application Software Project of Ministry of Industry and Information Technology (CEIEC-2020-ZM02-0132/06).

摘要: 在物联网与大数据应用蓬勃发展的背景下,各类感知设备产生海量的时序数据,设备管理软件版本的快速迭代导致时序数据的模式演化问题日益凸显.模式演化要求对数据模式进行版本管理,使数据进行模式变更时不产生信息损失,且支持对数据跨模式版本进行读写操作.结合流行的时序数据库管理系统,调研总结了各类数据库管理系统对模式演化的支持情况,对时序数据及其模式进行了形式化表述,对其模式演化的过程进行了分析,设计了一种面向时序数据的模式演化跟踪及查询方法,形式化表达了模式跟踪及跨模式版本查询的整体框架与关键步骤,并在时序数据库Apache IoTDB上进行了实现与测试.最后,分析了实现系统的性能,并展望了未来研究方向.

关键词: 时序数据库, 时序数据, 模式演化, 多模式版本数据, 查询重写

Abstract: In the context of the Internet of things and big data, vast amount of sensors generate massive time series data on daily basis. The fast iterations of software releases lead to frequent changes to the schema of these time series, which makes the management of schema evolution of time series increasingly prominent. Schema evolution requires the management of each version of data schema, so that there is no information loss during schema modification, and data can be accessed across multiple schema versions. Existing timeseries databases management system have limited support for schema evolution, while schema evolution may occur frequently under this circumstance. State-of-art research and technology for schema evolution mainly focus on relational database, struggling with complicated integrity constraint which is more flexible within timeseries database. This paper compares various databases with regard to schema evolution, provide a formal definition to the time series and its schemas, and analyzes the process of schema evolution. This paper designs a data-centric schema evolution tracing and querying system, discusses the key problems of schema tracking and cross schema version query in detail, and implements and tests it on the timeseries database Apache IoTDB. Finally, the performance of the system is evaluated, and the future research is discussed.

Key words: timeseries database, timeseries data, schema evolution, multi-schema-version database, query rewrite

中图分类号: