FAIDA：一种快速精确的图像消冗方法

陈  明; 王树鹏; 云晓春; 吴广君; 李  超

FAIDA：一种快速精确的图像消冗方法

FAIDA: A Fast and Accurate Image Deduplication Approach

摘要

摘要: 重复数据删除能够有效地提高存储利用率，现已在备份、归档系统中得到良好应用.然而这种基于比特流的Hash匹配策略对很多应用来说过于严格，例如重复图像删除.为了解决该问题，提出了一种快速精确的图像消冗方法.该方法首先根据Web图像特点给出重复图像定义，然后将图像消冗分为两个阶段.在重复图像发现阶段利用感知Hash等多重过滤技术提高图像检索速度和精度，在重复图像消冗阶段利用模糊逻辑推理选取质心图像以实现消冗.实验结果表明，该方法不仅具有快速、精确的重复图像消冗能力，而且在质心图像的选择上也能满足用户的感知要求.

Abstract: Deduplication is an effective way to improve storage utilization by eliminating redundant copies of duplicate data and replacing them with logical pointers to the unique copy. At present, it has been widely used in backup and archive systems. However, most of the existing deduplication systems use hashing to compute and compare data chunks to determine whether they are redundant. The Hash-based exact match is too strict for many applications, for example image deduplication. To solve this problem, a fast and accurate image deduplication approach is presented. We firstly give the definition of duplicate images according to the characteristics of Web images, and then divide image deduplication into two stages: duplicate image detection and duplicate image deduplication. In the first stage, we use perceptual hashing to improve image retrieval speed and multiple filters to improve image retrieval accuracy. In the second stage, we use fuzzy logic reasoning to select the proper centroid-images from duplicate image sets by simulating the process of human thinking. Experimental results demonstrate that the proposed approach not only has a fast and accurate ability to detect duplicate images, but also meets users’perceptive requirements in the selection of centroid-images.

HTML全文

参考文献(0)

施引文献

资源附件(0)