高级检索

    基于核密度估计的分布数据流离群点检测

    Finding Outliers in Distributed Data Streams Based on Kernel Density Estimation

    • 摘要: 基于数据流数据的挖掘算法研究受到了越来越多的重视.针对分布式数据流环境,提出基于核密度估计的分布数据流离群点检测算法.算法将各分布节点上的数据流作为全局数据流的子集,通过分布节点与中心节点的通信,维护基于全局数据流的分布密度估计.各分布节点基于该估计对其上的分布数据流进行离群点检测,从而得到基于全局数据流的离群点集合.对节点之间的交互以及离群点检测算法的细节进行了讨论.通过实验验证了算法的适用性和有效性.

       

      Abstract: Recently, there has been occurring more and more applications based on data stream models. Data mining in data stream, such as clustering, classifying, etc, becomes a hot research field. This paper presents an algorithm for outlier detection in distributed data streams. The data stream on every distributed node is taken for a subset of the global data stream, which consists of data on all distributed nodes. Because of huge network traffic, it is impossible to send all data to a central node and do detection. Based on the communication of distribution information between distributed nodes and the central node, the algorithm maintains the density estimation for the union of all streams. On every distributed node, global outliers can be detected by the estimation. Details of communication schedule and outlier detection are also discussed in this paper. Experimental results show promising availabilities of the approach.

       

    /

    返回文章
    返回