Advanced Search
    Guo Wenya, Zhang Ying, Liu Shengzhe, Yang Jufeng, Yuan Xiaojie. Relationship Aggregation Network for Referring Expression Comprehension[J]. Journal of Computer Research and Development, 2023, 60(11): 2611-2623. DOI: 10.7544/issn1000-1239.202220019
    Citation: Guo Wenya, Zhang Ying, Liu Shengzhe, Yang Jufeng, Yuan Xiaojie. Relationship Aggregation Network for Referring Expression Comprehension[J]. Journal of Computer Research and Development, 2023, 60(11): 2611-2623. DOI: 10.7544/issn1000-1239.202220019

    Relationship Aggregation Network for Referring Expression Comprehension

    • In this paper, we focus on the task of referring expression comprehension (REC), which aims to locate the corresponding regions in images referred by expressions. One of the main challenges is to visually ground the object relationships described by the input expressions. The existing mainstream methods mainly score objects based on their visual attributes and the relationships with other objects, and the object with the highest score is predicted as the referred region. However, these methods tend to only consider the relationships between the current evaluated region and its surroundings, but ignore the informative interactions among the multiple surrounding regions, which are important for matching the input expressions and visual content in image. To address this issue, we propose a relationship aggregation network (RAN) to construct comprehensive relationships and then aggregate them to predict the referred region. Specifically, we construct both the two kinds of aforementioned relationships based on graph attention networks. Then, the relationships most relevant to the input expression are selected and aggregated with a cross-modality attention mechanism. Finally, we compute the matching scores according to the aggregated features, based on which we predict the referred regions. Additionally, we improve the existing erase strategies in REC by erasing some continuous words to encourage the model find and use more clues. Extensive experiments on three widely-used benchmark datasets demonstrate the superiority of the proposed method.
    • loading

    Catalog

      Turn off MathJax
      Article Contents

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return