ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2014, Vol. 51 ›› Issue (11): 2518-2527.doi: 10.7544/issn1000-1239.2014.20130869

• 软件技术 • 上一篇    下一篇

不确定度模型下数据流自适应网格密度聚类算法

刘卓1,杨悦2,张健沛2,杨静2,初妍2,张泽宝2   

  1. 1(哈尔滨工程大学自动化学院 哈尔滨 150001);2(哈尔滨工程大学计算机科学与技术学院 哈尔滨 150001) (liuzhuo@hrbeu.edu.cn)
  • 出版日期: 2014-11-01
  • 基金资助: 
    基金项目:国家自然科学基金项目(61202274);中国博士后科学基金项目(2012M510927);黑龙江省博士后科学基金项目(LBH-Z12066);中央高校基本科研业务费专项资金项目(HEUCF100602)

An Adaptive Grid-Density Based Data Stream Clustering Algorithm Based on Uncertainty Model

Liu Zhuo1, Yang Yue2, Zhang Jianpei2, Yang Jing2, Chu Yan2, Zhang Zebao2   

  1. 1(College of Automation, Harbin Engineering University, Harbin 150001); 2(College of Computer Science and Technology, Harbin Engineering University, Harbin 150001)
  • Online: 2014-11-01

摘要: 随着计算机技术及感知技术的发展及应用,各个领域普遍出现不确定性数据流形态的新型数据,吸引了众多研究者的关注.现有的数据流聚类技术普遍忽略不确定性特征,常导致聚类结果的不合理甚至不可用.为数不多的针对不确定性特征的聚类方法片面考察不确定性,且大多基于K-Means算法,具有先天缺陷.针对这一问题展开研究,提出了不确定度模型下数据流自适应网格密度聚类算法(adaptive density-based clustering algorithm over uncertain data stream, ADC-UStream).对于不确定性特征,该算法在存在级和属性级不确定性统一策略下,构建熵不确定度模型进行不确定性度量,综合考察不确定性.采用网格-密度的聚类算法,基于衰减窗口模型设计时态和空间的自适应密度阈值,以适应不确定性数据流的时态性和非均匀分布特征.实验结果表明,不确定模型下的数据流网格密度自适应聚类算法ADC-UStream在聚类结果质量和聚类效率方面都具有较好的性能.

关键词: 不确定性, 数据流, 聚类, 网格-密度, 自适应密度阈值, 不确定度模型

Abstract: Uncertain data stream, a new widespread data form which is emerging in many application fields with the development of computer and sensing technology. The research of data analysis and processing of uncertain data stream has attracted the attention of many researchers. Existing data stream clustering techniques generally ignored uncertainty characteristics. It often makes the clustering results unreasonable even unavailable. The two aspects of uncertain character, existence-uncertainty and attributive-uncertainty, can affect the clustering process and results significantly. But they can’t be considered at same time in existing relevant work. The lately reported clustering algorithms are all based on K-Means algorithm with inherent shortage. In order to solve this problem, a data stream adaptive grid-density based algorithm, ADC-UStream, is proposed under the uncertainty of model. For the uncertainty characteristic, with the unified strategy of the presence and properties uncertainty, the algorithm builds the entropy uncertainty model to measure the uncertainty. With the comprehensive survey of uncertainty, the grid-density based clustering algorithm over attenuation window model is adopted to design the temporal and spatial adaptive density threshold, to adapt to the temporal and non-uniform distribution characteristics of the uncertainty data flow. The experimental results show that the ADC-UStream algorithm under the uncertainty model has good performance both in clustering quality and clustering efficiency.

Key words: uncertain character, data stream, clustering, grid-density, adaptive density threshold, uncertainty model

中图分类号: