An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

Dai Dongbo, Tang Chunlei, Qiu Boren, Xiong Yun, and Zhu Yangyong

Dai Dongbo, Tang Chunlei, Qiu Boren, Xiong Yun, and Zhu Yangyong. An Algorithm for Sequence Similarity Query with Optimized Multiple FilteringJ. Journal of Computer Research and Development, 2010, 47(10): 1785-1796.

Citation:

An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

Dai Dongbo, Tang Chunlei, Qiu Boren, Xiong Yun, and Zhu Yangyong

Graphical Abstract

Abstract

Abstract

Sequence data is an important data type, ubiquitous in many domains such as text, Web access log and biological database. Similarity query in this kind of data is a very important means for extracting useful information. One key factor for high performance of similarity query in huge sequence database is the filtering level of query algorithm，namely, designing those filters that can quickly filter out the unpromising strings for query string is very important. Combining metric property of sequences distance with sequences characteristics, an algorithm called SSQ_MF based on multiple filters is proposed, whose filtering level is further improved compared with those query algorithms with a single filter. The filters used in SSQ_MF are length filter,prefix filter, and reference-based filter. Then, the statistical information about the string database is pre-computed and some data structures are used to store those information. Furthermore, every filters filtering size is effectively estimated and a model in which optimal filtering order is determined by every filters filtering size is built. Comprehensive experimental results demonstrate that in terms of query performance, SSQ_MF is better than algorithms with a single filter and algorithms with multiple filters but executing in a random order.

FullText(HTML)

References (0)

Cited By

Turn off MathJax

Article Contents

An Algorithm for Sequence Similarity Query with Optimized Multiple Filtering

Abstract

Catalog

Export File

Citation

Format

Content