高级检索

    基于强化策略反馈的多模态自适应实体识别方法

    A multimodal adaptive entity recognition method based on reinforcement strategy feedback

    • 摘要: 命名实体识别任务的核心目标是从非结构化文本中提取具有特定语义类别的实体及其实体类型。随着社交媒体的快速发展,视觉模态信息往往会和文本信息一同出现,这使得基于单模态的命名实体识别方法识别效果不佳。因此,多模态命名实体识别已成为近年来的研究热点。然而,由于多模态间的表征差异而引发了视觉噪声干扰,并且文本模态内部存在的实体指代上下文模糊会导致实体识别困难。为了解决这些问题,本文提出了一种基于强化策略反馈与自适应损失的多模态实体识别方法。首先该方法使用基于GPT-4o的三阶段思维链(chain of thought, COT)方式,结合强化学习中的策略优化思想,通过渐进式推理与自适应反馈对图像和文本的匹配程度进行打分,并通过自适应决策函数过滤视觉信息带来的噪声干扰。其次,设计四类具体任务的损失函数,并将其进行自适应加权融合,以减轻实体识别中上下文模糊的问题。在两个具有代表性的公开数据集(Twitter-2015,Twitter-2017)上的实验表明,本文提出方法的总体F1分数分别为86.45和93.80,优于当前基线模型的性能,验证了本文方法的有效性。

       

      Abstract: The core objective of the named entity recognition task is to extract entities with specific semantic categories and their entity types from unstructured text. With the rapid development of social media, visual modal information often appears together with textual information, which makes the recognition effect of unimodal-based named entity recognition methods poor. Therefore, multimodal named entity recognition has become a research hotspot in recent years. However, visual noise interference is triggered due to the representational differences between multimodalities, and the blurring of entity referent contexts existing within textual modalities can lead to difficulties in entity recognition. In order to solve these problems, this paper proposes a multimodal entity recognition method based on reinforcement strategy feedback with adaptive loss. First, the method adopts a three-stage chain of thought (chain of thought, COT) method based on GPT-4o, which combines the idea of policy optimisation in reinforcement learning. The degree of match between image and text is scored through progressive inference and adaptive feedback, and the noise interference brought by visual information is filtered by adaptive decision function. Secondly, the loss functions for four types of specific tasks are designed and fused with adaptive weighting to alleviate the problem of context ambiguity in entity recognition. Experiments on two representative public datasets (Twitter-2015,Twitter-2017) show that the overall F1 scores of the proposed method in this paper are 86.45 and 93.80, which outperforms the performance of the current baseline model and validates the effectiveness of the proposed method in this paper.

       

    /

    返回文章
    返回