Abstract:
Online cross-modal hashing has received increasing attention due to its efficiency and effectiveness in handling cross-modal streaming data retrieval. Despite promising progress, most existing methods rely on accurate and clear supervised information. There are only a few online cross-modal hashing studies concentrating on unsupervised learning mode, and numerous challenges still need to be tackled. For example, streaming data usually suffer from the unbalanced distribution problem due to the limited volume of data in each chunk. Most existing methods neglect this problem, resulting in heightened sensitivity to outlier samples and compromised robustness. Moreover, existing models typically exploit global data distribution, while ignoring local neighborhood information that can promote hash learning. To solve these problems, we propose an unsupervised online cross-modal hashing method with double structure-preserving, called SPOCH(structure preserving online cross-modal hashing). It simultaneously explores the global structure and local structure to generate the corresponding common representation; thereafter, the learned common representation can be used to guide the hash learning process. In terms of global structure-preserving, we design the loss function based on L_2,1 norm, which can alleviate the sensitivity of outlier samples. In terms of local structure-preserving, we reconstruct sample representation based on neighbor relations that integrates the multi-modality information. In addition, to alleviate the forgetting problem, we propose joint optimization on streaming data, and design the corresponding update strategy to improve the training efficiency. We conducted experiments on two widely-used cross-modal retrieval datasets. Compared to the existing state-of-the-art unsupervised online cross-modal hashing methods, SPOCH achieves superior retrieval accuracy within a comparable or even shorter training time, validating the effectiveness of the proposed approach.