Abstract:
Image annotation plays an important role in content-based image retrieval. Since annotating images is expensive, researchers have proposed many methods exploiting the large amount of unlabeled data to improve the performance of classifiers. Among those methods, label propagation has been proven to be effective in many applications. With the proliferation of digital photography, the amount of images is increasing at a very high speed, and however, existing label propagation approaches cannot tackle with real-world large-scale problems because they need to construct graph structures of instances. In this paper, we propose a novel large-scale algorithm for image annotation, called RFLP, which combines the strengths of random forest and label propagation. The reason why to use random forest is that it shows good performance on scalability and generalization, and based on the locality of decision trees, the large-scale data can be compressed. At first, it reduces the large-scale problem to small-scale by random decision trees. Then a traditional label propagation approach can propagate labels on the compressed data quite efficiently. And after that, it spreads the propagation results to all the unlabeled instances using random forest. Experimental results show that, compared with traditional label propagation methods, the proposed RFLP is effective and significantly cost-saving.