Abstract:
Finding the longest common substring (LCSstr) for two given strings is an important problem in string analysis. It can be used in many applications such as approximate string matching, biological sequences analysis, plagiarism detection and computer virus signature detection. There are two algorithms to solve the longest common substring problem: Dynamic programming (LCSstrDP) and the suffix array (LCSstrSA). LCSstrDP solves the LCSstr problem by calculating the longest common suffix of the prefix (comparing from right to left). Its code is simple, but of low efficientcy LCSstrSA calculates the longest common prefix of the suffix (comparing from left to right). LCSstrSA’s time complexity is linear, though it is more complex. In this paper, we propose two LCSstr algorithms based on bi-directional comparison, named LCSstrSeL and LCSstrSCeL. LCSstrSeL skips the existing length of LCSstr with simple code and significantly improved efficiency, compared with LCSstrDP. On the basis of LCSstrSeL, LCSstrSCeL adds several mechanisms, such as character and continuous same value segment spanning. The test results show that not only the memory overhead of the algorithm is lower than that of LCSstrSA, but also the average efficiency is higher for small and medium strings. In some case the computation efficiency can be sublinear.