Abstract:
In real applications, mixed data sets with both numerical attributes and categorical attributes at the same time are more common. Recently, clustering analysis for mixed data has attracted more and more attention. In order to solve the problem of attribute weighting for high-dimensional mixed data, this paper proposes an attribute weighted clustering algorithm for mixed data based on information entropy. The main work includes: an extended Euclidean distance is defined for mixed data, which can be used to measure the difference between the objects and clusters more accurately and objectively. And a generalized mechanism is presented to uniformly assess the compactness and separation of clusters based on within-cluster entropy and between-cluster entropy. Then a measure of the importance of attributes is given based on this mechanism. Furthermore, an attribute weighted clustering algorithm for mixed data based on information entropy is developed. The effectiveness of the proposed algorithm is demonstrated in comparison with the widely used state-of-the-art clustering algorithms for ten real life datasets from UCI. Finally, statistical test is conducted to show the superiority of the results produced by the proposed algorithm.