高级检索

    RQIC: 一种高效时序相似搜索算法

    RQIC: An Efficient Similarity Searching Algorithm on Time Series

    • 摘要: 索引大规模时序数据库是高效时序搜索中的关键问题.提出了一种新颖的索引方案RQI, 它包括3种过滤策略: 即first-k过滤、索引低边界和上边界以及三角不等式修剪.基本的思想为首先运用Haar小波变换计算每个时序的小波系数,利用前面的k个小波系数形成一个最小边界矩阵,以利用点过滤方法; 然后将预先计算每个时序的低边界特征和上边界特征存放到索引当中; 最后采用三角不等式来修剪不相似的序列并确保没有漏报.同时提出了一种新的低边界距离函数SLBS和聚类算法CSA.通过CSA可保持索引良好的聚类特征以提高点过滤方法的效率,从而引入了一种更好的算法RQIC.在合成数据集和实时数据集的大量对比实验表明,RQIC是有效的且具备较高的查询效率.

       

      Abstract: Indexing large time series databases is crucial for efficient search of time series queries. An index scheme RQI (range query based on index) is introduced, which includes three filtering methods: first-k filtering, indexing lower bounding and upper bounding as well as triangle inequality pruning. The basic idea is calculating wavelet coefficients for each time series based on Haar wavelet transform. The first k coefficients are used to form a MBR (minimal bounding rectangle). Thus, the point filtering method can be used. In addition, lower bounding and upper bounding features of each time series are calculated in advance, which are stored into index structure. Finally, triangle inequality pruning method is adopted by calculating the distance between time series beforehand. Moreover, a new lower bounding distance function SLBS (symmetrical lower bounding based on segment) and a novel clustering algorithm CSA (clustering based on segment approximation) are proposed. The aim is to further improve the search efficiency of point filtering method by keeping a good clustering trait of index structure, thereby obtaining a better algorithm RQIC (range query based on Index and clustering). A lot of compared experiments are conducted over both synthetic and real data sets for RQIC. The results show that RQIC provides perfect pruning ability and high search efficiency.

       

    /

    返回文章
    返回