ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (9): 1979-1989.doi: 10.7544/issn1000-1239.2016.20150592

Previous Articles     Next Articles

A Clustering Ensemble Algorithm for Incomplete Mixed Data

Shi Qianyu, Liang Jiye, Zhao Xingwang   

  1. (School of Computer and Information Technology, Shanxi University, Taiyuan 030006) (Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education, Taiyuan 030006)
  • Online:2016-09-01

Abstract: Cluster ensembles have recently emerged a powerful clustering analysis technology and caught high attention of researchers due to their good generalization ability. From the existing work, these techniques held great promise, most of which generate the final results for complete data sets with numerical attributes. However, real life data sets are usually incomplete mixed data described by numerical and categorical attributes at the same time. And these existing algorithms are not very effective for an incomplete mixed data set. To overcome this deficiency, this paper proposes a new clustering ensemble algorithm which can be used to ensemble final clustering results for mixed numerical and categorical incomplete data. Firstly, the algorithm conducts completion of incomplete mixed data using three different missing value filling methods. Then, a set of clustering solutions are produced by executing K-Prototypes clustering algorithm on three different kinds of complete data sets multiple times, respectively. Next, a similarity matrix is constructed by considering all the clustering solutions. After that, the final clustering result is obtained by hierarchical clustering algorithms based on the similarity matrix. The effectiveness of the proposed algorithm is empirically demonstrated over some UCI real data sets and three benchmark evaluation measures. The experimental results show that the proposed algorithm is able to generate higher clustering quality in comparison with several traditional clustering algorithms.

Key words: clustering ensemble, incomplete data, mixed data, missing value imputation, K-Prototypes clustering algorithm

CLC Number: