Abstract:
Recently, there has been occurring more and more applications based on data stream models. Data mining in data stream, such as clustering, classifying, etc, becomes a hot research field. This paper presents an algorithm for outlier detection in distributed data streams. The data stream on every distributed node is taken for a subset of the global data stream, which consists of data on all distributed nodes. Because of huge network traffic, it is impossible to send all data to a central node and do detection. Based on the communication of distribution information between distributed nodes and the central node, the algorithm maintains the density estimation for the union of all streams. On every distributed node, global outliers can be detected by the estimation. Details of communication schedule and outlier detection are also discussed in this paper. Experimental results show promising availabilities of the approach.