ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2016, Vol. 53 ›› Issue (5): 1018-1028.doi: 10.7544/issn1000-1239.2016.20150131

Previous Articles     Next Articles

An Attribute Weighted Clustering Algorithm for Mixed Data Based on Information Entropy

Zhao Xingwang Liang Jiye   

  1. (School of Computer and Information Technology, Shanxi University, Taiyuan 030006) (Key Laboratory of Computational Intelligence and Chinese Information Processing(Shanxi University), Ministry of Education, Taiyuan 030006)
  • Online:2016-05-01

Abstract: In real applications, mixed data sets with both numerical attributes and categorical attributes at the same time are more common. Recently, clustering analysis for mixed data has attracted more and more attention. In order to solve the problem of attribute weighting for high-dimensional mixed data, this paper proposes an attribute weighted clustering algorithm for mixed data based on information entropy. The main work includes: an extended Euclidean distance is defined for mixed data, which can be used to measure the difference between the objects and clusters more accurately and objectively. And a generalized mechanism is presented to uniformly assess the compactness and separation of clusters based on within-cluster entropy and between-cluster entropy. Then a measure of the importance of attributes is given based on this mechanism. Furthermore, an attribute weighted clustering algorithm for mixed data based on information entropy is developed. The effectiveness of the proposed algorithm is demonstrated in comparison with the widely used state-of-the-art clustering algorithms for ten real life datasets from UCI. Finally, statistical test is conducted to show the superiority of the results produced by the proposed algorithm.

Key words: clustering analysis, mixed data, attribute weighting, information entropy, dissimilarity measure

CLC Number: