基于自相似与对比学习的图像跨域转换算法

赵磊; 张慧铭; 邢卫; 林志洁; 林怀忠; 鲁东明; 潘洵; 许端清

doi:10.7544/issn1000-1239.202220039

基于自相似与对比学习的图像跨域转换算法

Image Cross-Domain Translation Algorithm Based on Self-Similarity and Contrastive Learning

摘要

摘要: 图像跨域转换，又称图像翻译，是一种旨在将源域的图像转换为目标域的图像的技术，具体来说是使生成图像在保持源域图像的结构（轮廓、姿态等）的同时具有目标域图像的风格（纹理、颜色等）. 图像跨域转换技术在视觉领域有着广泛的应用，如照片编辑和视频特效制作. 近年来，该技术在深度学习尤其是生成对抗网络的基础上得到了飞速发展，也取得了令人印象深刻的结果，但是迁移后的生成图像仍然存在颜色模式坍塌、内容结构无法保持等问题，针对这些问题，提出了一种基于自相似性与对比学习的图像跨域转换算法. 该算法利用预先训练的深度神经网络模型提取图像的内容特征和风格特征，将感知损失和基于自相似性的损失作为图像内容损失函数，同时使用一种宽松的最优传输损失和基于矩匹配计算的损失作为图像风格损失函数对提出的神经网络进行训练，并通过将生成图像和目标域图像标记为正样本对，将生成图像和源域标记为负样本进行对比学习. 在4个数据集上对提出的算法进行了实验验证，结果表明提出的算法在生成的结果图像上较好地保持了源域图像的内容结构，同时减少颜色的模式坍塌，且使生成的图像风格与引导图像的风格更加一致.

Abstract: Image cross-domain transformation, also known as image translation, is a technology that aims to transform the images of the source domain into the ones of the target domain. Specifically, the converted images have the style of the target domain images (contour, posture, etc.) while maintaining the structure of the source domain images (texture, color, etc.). Image cross-domain transformation technology is widely used in the field of vision, such as photo editing and video special effects production. In recent years, this technology has developed rapidly based on deep learning, especially the generation of adversarial networks, and achieved impressive results. However, there are still problems, including the collapse of color mode and the inability to maintain the content structures in the transformed images. To solve the above problems, we propose an image cross-domain transformation algorithm based on self-similarity and contrastive learning. The algorithm uses the pre-trained deep neural network model to extract the content and style features of the images and takes the perceptual loss and the loss based on self-similarity as the image content loss function. At the same time, a loose optimal transport loss and the moment matching loss are used as the image style loss function to train the proposed neural network, and the transformed images and the target domain images are marked as positive sample pairs, and the translated images and the source domain images are marked as negative samples for contrastive learning. The proposed algorithm is verified by experiments on four data sets. The results show that the proposed method maintains the content structure of the source domain images, reduces the mode collapse of color, and makes the style of the translated images more consistent with that of the guidance images.

HTML全文

参考文献(49)

施引文献

资源附件(0)