A Fast Clustering Algorithm for Information Retrieval
-
Graphical Abstract
-
Abstract
Due to the fast advance of information retrieval technique, information overload has become a headache problem to Internet users. In order to alleviate user's inconvenience to distinguish useful information from massive junk information, the research for improving retrieval system has gradually become hotter and hotter. Up to now, many techniques have been proposed for automatically categorizing and organizing Web information for users. Among them, clustering is one of the most extensively employed tools. Through clustering retrieval information, Internet users can quickly find out where their interesting retrieval results locate. Unfortunately, traditional clustering algorithms are either ineffective or inefficient for this task. As a result, a novel algorithm specially designed for clustering retrieval information is proposed. This algorithm applies maximum-minimum principle to extract accumulation points to form initial clusters at first. Experiment results show that, this initial cluster partitioning is approximate to the optimal partitioning and only needs small iterative adjustment steps to get convergence. After that, it iteratively adjusts feature set of each cluster to let cluster partitioning more and more precise. Simultaneously, it hierarchically separates the clusters, which don't meet convergence condition, into some sub-clusters to possess the merit of hierarchically representing information. Experiment results also demonstrate that time complexity of this algorithm is close to the recent techniques for clustering retrieval information. Besides, because of iteratively adjusting feature sets, it enables clustering results to be more precise and reasonable.
-
-