A Multimodal Adaptive Entity Recognition Method Based on Reinforcement Strategy Feedback
-
Graphical Abstract
-
Abstract
The core objective of the named entity recognition (NER) is to identify entities with specific semantic categories and types in unstructured text. With the rapid growth of social media, textual information is often accompanied by visual content, forming multimodal data. To improve the accuracy of entity recognition, multimodal NER methods exploit semantic information from different modalities fully to achieve the complementary and deep fusion of cross-modal features. However, differences in the representation of modalities may introduce visual noise that interferes with entity recognition. issues such as entity ambiguity or contextual semantic vagueness within the textual modality complicate recognition. To address these challenges, we proposed a multimodal NER method based on a reinforcement strategy feedback with adaptive loss mechanism. First, the method adopts a three-stage chain of thought (COT) reasoning process based on GPT-4o to form a progressive reasoning framework that incorporates the adaptive feedback mechanism in reinforcement learning. The degree of matching between images and text is scored, and the interference of visual noise is effectively filtered using an adaptive decision function. Secondly, four task-specific loss functions are designed and jointly optimized through an adaptive weighted fusion strategy to alleviate the uncertainty caused by contextual ambiguity. Experiments on two representative public datasets (Twitter-2015 and Twitter-2017) show that the overall F1 scores of our proposed method are 86.45 and 93.80, respectively, representing a significant improvement on current state-of-the-art baseline models.
-
-