A Cross-Modal Entity Linking Model Based on Contrastive Learning

Wang Yuanzheng; Sun Wenxiang; Fan Yixing; Liao Huaming; Guo Jiafeng

doi:10.7544/issn1000-1239.202330731

Wang Yuanzheng, Sun Wenxiang, Fan Yixing, Liao Huaming, Guo Jiafeng. A Cross-Modal Entity Linking Model Based on Contrastive Learning[J]. Journal of Computer Research and Development. DOI: 10.7544/issn1000-1239.202330731

Citation:

A Cross-Modal Entity Linking Model Based on Contrastive Learning

Graphical Abstract

Graphical Abstract

Abstract

Abstract

Image-text cross-modal entity linking is an extension of traditional named entity linking. The inputs are images containing entities, which are linked to textual entities in the knowledge base. Existing models usually adopt a dual-encoder architecture. It encodes entities of visual and textual modality into separate vectors, then calculates their similarities using dot product, and links the image entities to the most similar text entities. The training process usually adopts the cross-modal contrastive learning task. For a given modality of entity vectors, this task pulls closer the vector of another modality that corresponds to itself, and pushes away the vector of another modality corresponding to other entities. However, this approach overlooks the differences in representation difficulty within the two modalities: visually similar entities are often more difficult to distinguish than textual similar entities, resulting in the incorrect linking of the former ones. To solve this problem, we propose two new contrastive learning tasks, which can enhance the discriminative power of the vectors. The first is self-contrastive learning, which aims to improve the distinction between visual vectors. The second is hard-negative contrastive learning, which helps a textual vectors to distinguish similar visual vectors. We conduct experiments on the open-source dataset WikiPerson. With a knowledge base of 120k entities, our model achieves an accuracy improvement of 4.5% compared to the previous state-of-the-art model.

FullText(HTML)

References (45)

Cited By

Turn off MathJax

Article Contents

A Cross-Modal Entity Linking Model Based on Contrastive Learning

Graphical Abstract

Abstract

Catalog

Export File

Citation

Format

Content