Text-to-Image Generation Method Based on Image-Text Semantic Consistency

Xue Zhihang; Xu Zheming; Lang Congyan; Feng Songhe; Wang Tao; Li Yidong

doi:10.7544/issn1000-1239.202220416

Xue Zhihang, Xu Zheming, Lang Congyan, Feng Songhe, Wang Tao, Li Yidong. Text-to-Image Generation Method Based on Image-Text Semantic Consistency[J]. Journal of Computer Research and Development, 2023, 60(9): 2180-2190. DOI: 10.7544/issn1000-1239.202220416

Citation:

Text-to-Image Generation Method Based on Image-Text Semantic Consistency

Graphical Abstract

Abstract

Abstract

In recent years, text-to-image generation methods based on generative adversarial networks have become a popular area of research in cross-media convergence. Text-to-image generation methods aim to improve the semantic consistency between text descriptions and generated images by extracting more representational text and image features. Most of the existing methods model the global image features and the initial text semantic features, ignoring the limitations of the initial text features and not fully utilizing the guidance of the semantic consistency of the generated images with the text features, thus reducing the representativeness of the text information in text-to-image synthesis. In addition, because the dynamic interaction between the generated object regions is not considered, the generated network can only roughly delineate the target region and ignore the potential correspondence between local regions of the image and the semantic labels of the text. To solve the above problems, a text-to-image generation method, called ITSC-GAN, based on image-text semantic consistency is proposed in this paper. The model firstly designs a text information enhancement module to enhance the text information using the generated images, thus improving the characterization of text features. Secondly, the model proposes an image regional attention module to enhance the characterization ability of image features by mining the relationship between image sub-regions. By jointly utilizing the two modules, higher consistency between image local features and text semantic labels is achieved. Finally, the model uses the generator and discriminator loss functions as constraints to improve the quality of the generated images and the semantic agreement with the text description. The experimental results show that the IS (inception score) metric of the ITSC-GAN model increases by about 7.42%, the FID (Fréchet inception distance) decreases by about 28.76% and the R-precision increased by about 14.95% on CUB dataset compared with the current mainstream approach AttnGAN model. A large number of experimental results fully validate the effectiveness and superiority of ITSC-GAN model.

FullText(HTML)

References (33)

Supplements (0)

Cited By

Turn off MathJax

Article Contents

Text-to-Image Generation Method Based on Image-Text Semantic Consistency

Abstract

Catalog

Export File

Citation

Format

Content