ISSN 1000-1239 CN 11-1777/TP

Journal of Computer Research and Development ›› 2019, Vol. 56 ›› Issue (8): 1721-1730.doi: 10.7544/issn1000-1239.2019.20190329

Special Issue: 2019人工智能前沿进展专题

Previous Articles     Next Articles

Scene Graph Generation Based on Shuffle Residual Context Information

Lin Xin1, Tian Xin1, Ji Yi1, Xu Yunlong2, Liu Chunping1,3   

  1. 1(School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006);2(Applied Technology College of Soochow University, Suzhou, Jiangsu 215300);3(Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012)
  • Online:2019-08-01

Abstract: Scene graphs play an important role in visual understanding. Existing scene graph generation methods focus on the research of the subjects, the objects as well as the predicates between them. However, human being abstracts the relationships using spatial relation context, semantic context and interaction between scene objects for better understanding and reasoning as whole. In order to obtain the better global context representation and reduce the impact of dataset bias, we propose a new framework of scene graph generation, called as residual shuffle sequence model (RSSQ). Our method is made up of object decoding, residual shuffle and position embedding modules. Residual shuffle module is stacked with two basic structures including the random shuffle operation and the residual bidirectional LSTM. We implement the random shuffle on the hidden state of bidirectional LSTM by the process of iterative operation to reduce the impact of dataset bias, and extract the shared global context information by the residual connection structure. To strengthen the spatial relationship between pair-wise objects, the encoding is achieved using the relative position and area ratio of objects in position embedding module. The experimental results of three sub-tasks of different difficulty performed on Visual Genome dataset, demonstrate that the poposed method can generate better scene graphs under Recall@50 and Recall@100 settings due to better global context and spatial information.

Key words: scene graph, visual relationship, context, residual bidirectional LSTM, object detection

CLC Number: