Advanced Search
    Wang Jinbao, Gao Hong, Li Jianzhong, Yang Donghua. Processing String Similarity Search in External Memory Efficiently[J]. Journal of Computer Research and Development, 2015, 52(3): 738-748. DOI: 10.7544/issn1000-1239.2015.20130683
    Citation: Wang Jinbao, Gao Hong, Li Jianzhong, Yang Donghua. Processing String Similarity Search in External Memory Efficiently[J]. Journal of Computer Research and Development, 2015, 52(3): 738-748. DOI: 10.7544/issn1000-1239.2015.20130683

    Processing String Similarity Search in External Memory Efficiently

    • String similarity search is fundamental to various applications, such as data cleaning, spell checking, bioinformatics and information integration, since users tend to provide fuzzy queries in these applications due to input errors of both queried strings and data strings. With the rapid growth of data size, big string datasets become ubiquitous, and almost every modern information system stores data presented in string format. String similarity search should be well supported in modern information systems. Memory based q-gram inverted indexes fail to support string similarity search over big string datasets due to the memory constraint, and it can no longer work if the data size grows larger than memory size. Existing external memory method, Behm-Index, only supports length-filter and prefix filter. This paper proposes LPA-Index to reduce I/O cost for better query response time. It is a disk resident index which suffers no limitation on data size compared to memory size. LPA-Index supports both length-filter and positional filter which reduce query candidates efficiently, and it selectively reads inverted lists during query processing for better I/O performance. Experiment results on both real datasets and a synthetic dataset demonstrate the efficiency of LPA-Index and its advantages over existing state of art disk index Behm-Index with regard to I/O cost and query response time.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return