Abstract:
The main aim of data stream subspace clustering is to find clusters in subspace in rational time accurately. The existing data stream subspace clustering algorithms are greatly influenced by parameters. Generally, the number of clusters or feature subspace need predefining, and the clustering result can't describe the changes of data stream accurately. Further,they cannot describe the changes of clusters accurately and the clustering result will be influenced. Due to the flaws mentioned above, we propose a new data stream subspace clustering algorithm, SC-RP, in which the number of clusters or the feature subspace need not predefining. SC-RP has the advantages of fast clustering and being insensitive to outliers. When data stream changes, the changes will be recorded by the data structure named Region-tree, and the corresponding statistics information will be updated. Further SC-RP can regulate clustering results in time. According to the experiments on real datasets and synthetic datasets, SC-RP is superior to the existing data stream subspace clustering algorithms on both clustering precision and clustering speed, and it has good scalability to the number of clusters and dimensions.