Abstract:
Indexing large time series databases is crucial for efficient search of time series queries. An index scheme RQI (range query based on index) is introduced, which includes three filtering methods: first-k filtering, indexing lower bounding and upper bounding as well as triangle inequality pruning. The basic idea is calculating wavelet coefficients for each time series based on Haar wavelet transform. The first k coefficients are used to form a MBR (minimal bounding rectangle). Thus, the point filtering method can be used. In addition, lower bounding and upper bounding features of each time series are calculated in advance, which are stored into index structure. Finally, triangle inequality pruning method is adopted by calculating the distance between time series beforehand. Moreover, a new lower bounding distance function SLBS (symmetrical lower bounding based on segment) and a novel clustering algorithm CSA (clustering based on segment approximation) are proposed. The aim is to further improve the search efficiency of point filtering method by keeping a good clustering trait of index structure, thereby obtaining a better algorithm RQIC (range query based on Index and clustering). A lot of compared experiments are conducted over both synthetic and real data sets for RQIC. The results show that RQIC provides perfect pruning ability and high search efficiency.