ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2015, Vol. 52 ›› Issue (9): 1976-1991.doi: 10.7544/issn1000-1239.2015.20140479

Previous Articles     Next Articles

Similarity Query Processing Algorithm over Data Stream Based on LCSS

Wang Shaopeng1, Wen Yingyou1,2, Zhao Hong1,2   

  1. 1(College of Information Science and Engineering, Northeastern University, Shenyang 110819); 2(Key Laboratory of Medical Image Computing(Northeastern University), Ministry of Education, Shenyang 110819)
  • Online:2015-09-01

Abstract: Nowadays similarity query about data stream has been essential in many applications, like smart home and environmental monitoring. However, few of the current relevant researches take LCSS (longest common subsequence) as the similarity measurement function. The NAIVE algorithm gets the query results by comparing the threshold and the value of measurement function which is obtained based on the basic dynamic programming method. The similarity query over data stream based on the LCSS is considered in this paper. The D2S-PC algorithm is proposed to overcome the drawback that the query result cannot be gotten until the calculations on all the elements in the full dynamic programming matrix are finished. It defines the PS and CC domains of the matrix over every window, and utilizes the characteristics of the similarity query and matrix members in these two domains effectively. By taking this algorithm, the similarity query results can be obtained before the final length of LCSS is calculated. Compared with the original algorithm, it reduces the computations about the members in the matrix greatly. Extensive experiments on real and synthetic datasets show that the D2S-PC algorithm is effective in handling the similarity query over data stream based on the LCSS in the condition of more precise query results, and can meet the requirements of practical applications.

Key words: data stream, similarity query, data distortion, longest common subsequence (LCSS), dynamic programming method

CLC Number: