Abstract:
Clustering is an important research in data mining. Clustering in large data sets becomes a nut with the accumulating of the data. Despite its simplicity and its linear time, a serial k-Means algorithm's time complexity remains expensive when it is applied to a large data set. Distributed clustering is an effective method to solve this problem. In this paper, the knowledge of vectors' inner product inequation is adopted to improve efficiency of the existing parallel k-Means algorithm(k-DMeans), and an effective distributed k-Means clustering algorithm k-DCBIP is proposed. Theoretical analysis and experimental results testify that k-DCBIP outperforms the algorithm k-DMeans, and it is effective and efficient.