Abstract:
In the context of the Internet of things and big data, vast amount of sensors generate massive time series data on daily basis. The fast iterations of software releases lead to frequent changes to the schema of these time series, which makes the management of schema evolution of time series increasingly prominent. Schema evolution requires the management of each version of data schema, so that there is no information loss during schema modification, and data can be accessed across multiple schema versions. Existing timeseries databases management system have limited support for schema evolution, while schema evolution may occur frequently under this circumstance. State-of-art research and technology for schema evolution mainly focus on relational database, struggling with complicated integrity constraint which is more flexible within timeseries database. This paper compares various databases with regard to schema evolution, provide a formal definition to the time series and its schemas, and analyzes the process of schema evolution. This paper designs a data-centric schema evolution tracing and querying system, discusses the key problems of schema tracking and cross schema version query in detail, and implements and tests it on the timeseries database Apache IoTDB. Finally, the performance of the system is evaluated, and the future research is discussed.