时序数据库关键技术综述

刘帅; 乔颖; 罗雄飞; 赵怡婧; 王宏安

doi:10.7544/issn1000-1239.202330536

摘要: 随着工业物联网（industrial Internet of things，IIoT）的不断发展，越来越多的设备和传感器开始连接到网络中，产生了大量的时间序列数据（简称“时序数据”），时序数据爆炸式的增长给数据库管理系统带来了新的挑战：持续高吞吐量数据摄取、低延迟多维度数据查询、高性能时间序列索引以及低成本数据存储. 近年来时序数据库技术已经成为一个研究热点，一些学者对时序数据库技术进行了深入的研究，同时出现了一些专门用于管理时序数据的时序数据库，并且已经被应用在多个领域，成为工业物联网中不可缺少的关键组成. 现有的时序数据库相关综述侧重于时序数据库的功能和性能比较，以及在特定领域中对时序数据库的选择建议，缺少对时序数据库持久化存储、查询、计算和索引等关键技术的研究，同时这些综述工作出现的时间较早，缺少对现代时序数据库关键技术的研究. 对学术界时序数据存储研究和工业界时序数据库进行了全面的调查和研究，凝练了时序数据库的4类关键技术：1）时间序列索引优化技术；2）内存数据组织技术；3）高吞吐量数据摄取和低延迟数据查询技术；4）海量历史数据低成本存储技术. 同时分析总结了时序数据库评测基准. 最后，展望了时序数据库关键技术在未来的发展方向.

Abstract: With the continuous development of the industrial Internet of things (IIoT), an increasing number of devices and sensors are being connected to networks, resulting in a massive influx of time series data. The explosive growth of time series data presents new challenges for database management systems: continuous high-throughput data ingestion, low-latency multidimensional data queries, high-performance time series indexing, and cost-effective data storage. In recent years, time series database technology has become a hot research topic in the field of databases. Some scholars have conducted in-depth research on time series database technology, while specialized time series databases have emerged for managing time series data and have been applied in various fields. These databases have become essential components in IIoT. The existing reviews of time series databases primarily focus on the comparison of functionalities and performance, as well as providing selection recommendations for specific domains. There is a lack of research on key technologies such as data persistence, querying, computation, and indexing in time series stores. Additionally, these reviews appeared earlier and lacked research on modern time series database technologies. We conduct a comprehensive investigation and research analysis of both academic research on time series data storage and industrial time series databases. We take a deep dive into four key technologies in time series databases: 1) time series index optimization techniques; 2) in-memory data organization techniques; 3) high-throughput data ingestion and low-latency data query techniques; 4) cost-effective storage techniques for massive historical data. We also analyze and summarize existing TSDB benchmarks. Finally, we present future development directions for the key technologies in time series databases.

时序数据库关键技术综述

Key Techniques of Time Series Databases: A Survey