高级检索

    基于多例学习的Web图像聚类

    Web Image Clustering Based-on Multiple Instance Learning

    • 摘要: 在图像分类和自动标注系统中,多例学习(MIL)是研究的热点.目前MIL中的算法多为监督学习方法.针对非监督学习,在基于EM算法和启发式迭代优化算法的框架下,提出了6种多例聚类算法,并通过它们对来自于真实Web环境下的图像进行聚类以分析用户的搜索兴趣.由于一幅图像含有若干个区域,每个区域可被看为一个样例,属于同一个图像的区域则组成一个包.因此如何理解图像语义内容的问题即转化为多例学习.在多例学习的经典数据集MUSK数据和来自于Web图像集上的比较实验表明,提出的多例聚类算法具有优良的聚类性能.

       

      Abstract: In image retrieval and annotation systems, multiple instance learning (MIL) has been studied actively. Since each image contains several regions and each region can be regarded as an instance, the image retrieval is then transformed into a MIL problem. The key assumption of MIL is that: a bag is positive if at least one of its instances is a positive example; otherwise, the bag is negative. In the setting of MIL, each image is viewed as a bag of semantic regions. Most of the state-of-the-art methods solve the MIL problem in a supervised way. However, two unsupervised frameworks for clustering multi-instance objects based on expectation maximization (EM) approach and iterative heuristic optimization are proposed respectively. Under each framework, three new algorithms are introduced to find users’ interests on specific Web images without any manual labeled data. The EM approach takes instances as members of concepts. Each concept is modeled by a statistical process. Then a cluster of MI objects is considered as a multinomial distribution over the components of the mixture model of instances. The other framework is based on the idea of iterative heuristic optimization. It selects an instance from each MI object in every iteration process to determine the clustering model of MI objects. Hence it transforms the multi-instance object clustering problem into a normal clustering problem. Furthermore, all the algorithms are evaluated on both the MUSK benchmark data sets and a real-world Web image dataset downloaded from Yahoo. And comparative studies show the effectiveness of the proposed algorithms.

       

    /

    返回文章
    返回