Abstract:
The core objective of the named entity recognition task is to extract entities with specific semantic categories and their entity types from unstructured text. With the rapid development of social media, visual modal information often appears together with textual information, which makes the recognition effect of unimodal-based named entity recognition methods poor. Therefore, multimodal named entity recognition has become a research hotspot in recent years. However, visual noise interference is triggered due to the representational differences between multimodalities, and the blurring of entity referent contexts existing within textual modalities can lead to difficulties in entity recognition. In order to solve these problems, this paper proposes a multimodal entity recognition method based on reinforcement strategy feedback with adaptive loss. First, the method adopts a three-stage chain of thought (chain of thought, COT) method based on GPT-4o, which combines the idea of policy optimisation in reinforcement learning. The degree of match between image and text is scored through progressive inference and adaptive feedback, and the noise interference brought by visual information is filtered by adaptive decision function. Secondly, the loss functions for four types of specific tasks are designed and fused with adaptive weighting to alleviate the problem of context ambiguity in entity recognition. Experiments on two representative public datasets (Twitter-2015,Twitter-2017) show that the overall F1 scores of the proposed method in this paper are 86.45 and 93.80, which outperforms the performance of the current baseline model and validates the effectiveness of the proposed method in this paper.