Advanced Search
    Dai Dongbo, Tang Chunlei, Qiu Boren, Xiong Yun, and Zhu Yangyong. An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering[J]. Journal of Computer Research and Development, 2010, 47(10): 1785-1796.
    Citation: Dai Dongbo, Tang Chunlei, Qiu Boren, Xiong Yun, and Zhu Yangyong. An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering[J]. Journal of Computer Research and Development, 2010, 47(10): 1785-1796.

    An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

    • Sequence data is an important data type, ubiquitous in many domains such as text, Web access log and biological database. Similarity query in this kind of data is a very important means for extracting useful information. One key factor for high performance of similarity query in huge sequence database is the filtering level of query algorithm,namely, designing those filters that can quickly filter out the unpromising strings for query string is very important. Combining metric property of sequences distance with sequences characteristics, an algorithm called SSQ_MF based on multiple filters is proposed, whose filtering level is further improved compared with those query algorithms with a single filter. The filters used in SSQ_MF are length filter,prefix filter, and reference-based filter. Then, the statistical information about the string database is pre-computed and some data structures are used to store those information. Furthermore, every filters filtering size is effectively estimated and a model in which optimal filtering order is determined by every filters filtering size is built. Comprehensive experimental results demonstrate that in terms of query performance, SSQ_MF is better than algorithms with a single filter and algorithms with multiple filters but executing in a random order.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return