Abstract:
Typical clustering algorithms output a single partition of the data. However, in real world applications, data can often be interpreted in many different ways and has different reasonable partitions from multiple views. Instead of committing to one clustering solution, here we introduce a novel algorithm, NrMIB (non-redundant multi-view information bottleneck), which can provide several non-redundant clustering solutions from multiple views to the user. Our approach employs the information bottleneck (IB) method, which aims to maximize the relevant information preserved by clustering results, to ensure the qualities of the clustering solutions, whilst the mutual information between the clustering labels and the known data partitions is minimized to ensure that the new clustering solutions are non-redundant. By adopting the mutual information and MeanNN differential entropy to estimate the preserved information, the NrMIB can be used to analyze both co-occurrence data and Euclidean space data. Besides, our algorithm is also suitable to analyze high dimension data, and can discover both linear and non-linear cluster shapes. We perform experiments on synthetic data pattern recognition, face recognition, and document clustering to assess our method against a large range of clustering algorithms in the literature. The experimental results show that the proposed NrMIB algorithm can discover the multiple reasonable partitions resided in the data, and the performance of NrMIB is superior to three non-redundant multi-view clustering algorithms examined here.