ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (9): 1976-1991.doi: 10.7544/issn1000-1239.2015.20140479

• 软件技术 • 上一篇    下一篇

基于LCSS的数据流相似性查询处理算法研究

王少鹏1, 闻英友1,2, 赵宏1,2   

  1. 1(东北大学信息科学与工程学院 沈阳 110819); 2(医学影像计算教育部重点实验室(东北大学) 沈阳 110819) (wangshaopeng1984@163.com)
  • 出版日期: 2015-09-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(60903159,61173153);中央高校基本科研业务费专项资金项目(N110818001,N100218001);沈阳市科技计划基金项目(1091176-1-00)

Similarity Query Processing Algorithm over Data Stream Based on LCSS

Wang Shaopeng1, Wen Yingyou1,2, Zhao Hong1,2   

  1. 1(College of Information Science and Engineering, Northeastern University, Shenyang 110819); 2(Key Laboratory of Medical Image Computing(Northeastern University), Ministry of Education, Shenyang 110819)
  • Online: 2015-09-01

摘要: 数据流相似性查询广泛应用于智能家居、环境监测等领域.当前以LCSS(longest common subsequence)作为相似性测度函数的研究并不多.NAIVE算法使用基本动态规划方法计算测度函数值,通过该值与相似阈值的比较得到查询结果,对基于LCSS的数据流相似性查询问题进行研究.针对NAIVE算法必须在动态规划矩阵所有成员取值的计算完成后才能得到查询结果的缺点,提出了一种基于PS(possible solution)-CC(column critical)域优化策略的数据流相似性查询处理算法.该算法划定了每个窗口上动态规划矩阵的PS域和CC域,很好地利用了这2个域中成员所具有的性质和相似性查询的特点,无须获得测度函数的最终值便可得到查询结果,省略了很多矩阵成员的计算.实验部分证明了该算法的有效性,与同类算法相比,在处理具有更高精度结果要求的查询时效果更好.

关键词: 数据流, 相似性查询, 数据畸变, 最长公共子序列, 动态规划方法

Abstract: Nowadays similarity query about data stream has been essential in many applications, like smart home and environmental monitoring. However, few of the current relevant researches take LCSS (longest common subsequence) as the similarity measurement function. The NAIVE algorithm gets the query results by comparing the threshold and the value of measurement function which is obtained based on the basic dynamic programming method. The similarity query over data stream based on the LCSS is considered in this paper. The D2S-PC algorithm is proposed to overcome the drawback that the query result cannot be gotten until the calculations on all the elements in the full dynamic programming matrix are finished. It defines the PS and CC domains of the matrix over every window, and utilizes the characteristics of the similarity query and matrix members in these two domains effectively. By taking this algorithm, the similarity query results can be obtained before the final length of LCSS is calculated. Compared with the original algorithm, it reduces the computations about the members in the matrix greatly. Extensive experiments on real and synthetic datasets show that the D2S-PC algorithm is effective in handling the similarity query over data stream based on the LCSS in the condition of more precise query results, and can meet the requirements of practical applications.

Key words: data stream, similarity query, data distortion, longest common subsequence (LCSS), dynamic programming method

中图分类号: