一种残差置乱上下文信息的场景图生成方法

林欣; 田鑫; 季怡; 徐云龙; 刘纯平

doi:10.7544/issn1000-1239.2019.20190329

一种残差置乱上下文信息的场景图生成方法

林欣¹,
田鑫¹,
季怡¹,
徐云龙²,
刘纯平^1,3

¹(苏州大学计算机科学与技术学院江苏苏州 215006)
²(苏州大学应用技术学院江苏苏州 215300)
³(符号计算与知识工程教育部重点实验室(吉林大学) 长春 130012) (xlin2017@stu.suda.edu.cn)

基金项目: 国家自然科学基金项目(61773272,61272258,61301299)；吉林大学符号计算与知识工程教育部重点实验室项目(93K172016K08)；江苏高校优势学科建设工程资助项目

详细信息

中图分类号: TP391
计量
- 文章访问数: 925
- HTML全文浏览量: 6
- PDF下载量: 443
出版历程
- 发布日期: 2019-07-31

Scene Graph Generation Based on Shuffle Residual Context Information

¹(School of Computer Science and Technology, Soochow University, Suzhou, Jiangsu 215006)
²(Applied Technology College of Soochow University, Suzhou, Jiangsu 215300)
³(Key Laboratory of Symbol Computation and Knowledge Engineering (Jilin University), Ministry of Education, Changchun 130012)

摘要

摘要: 场景图在视觉理解中有着很重要的作用.现有的场景图生成方法对于主语、宾语以及主宾语间的视觉关系进行研究.但是,人类通过空间关系上下文、语义上下文和目标之间的互动信息来进行关系的理解和推理.为了获得更好的全局上下文表示，同时减少数据集偏差的影响，提出了一个新的场景图生成框架RSSQ(residual shuffle sequence model).该框架由目标解码、残差置乱和位置嵌入3部分构成.残差置乱模块由随机置乱和残差连接的双向LSTM的基本结构叠加而成，利用迭代方式实现随机打乱双向LSTM的隐藏状态以减少数据集偏差影响，利用残差连接提取共享的全局上下文信息.在位置嵌入模块中，通过对目标的相对位置和面积比例的编码则可以增强目标对之间的空间关系.在数据集Visual Genome的3个不同层次子任务的实验中，证明了提出的RSSQ方法因全局上下文改善和空间关系增强，在Recall@50和Recall@100指标评价下，相对于现有方法能生成更好的场景图.
- 场景图 /
- 视觉关系 /
- 上下文 /
- 残差双向LSTM /
- 目标检测
Abstract: Scene graphs play an important role in visual understanding. Existing scene graph generation methods focus on the research of the subjects, the objects as well as the predicates between them. However, human being abstracts the relationships using spatial relation context, semantic context and interaction between scene objects for better understanding and reasoning as whole. In order to obtain the better global context representation and reduce the impact of dataset bias, we propose a new framework of scene graph generation, called as residual shuffle sequence model (RSSQ). Our method is made up of object decoding, residual shuffle and position embedding modules. Residual shuffle module is stacked with two basic structures including the random shuffle operation and the residual bidirectional LSTM. We implement the random shuffle on the hidden state of bidirectional LSTM by the process of iterative operation to reduce the impact of dataset bias, and extract the shared global context information by the residual connection structure. To strengthen the spatial relationship between pair-wise objects, the encoding is achieved using the relative position and area ratio of objects in position embedding module. The experimental results of three sub-tasks of different difficulty performed on Visual Genome dataset, demonstrate that the poposed method can generate better scene graphs under Recall@50 and Recall@100 settings due to better global context and spatial information.
- scene graph /
- visual relationship /
- context /
- residual bidirectional LSTM /
- object detection

HTML全文

参考文献(0)

施引文献(26)

期刊类型引用(5)

1.	赵小明，黄祥志，张丽丽，余涛. 基于分布式的遥感数据管理系统研究与实现. 电子设计工程. 2023(11): 1-5 . 百度学术
2.	韩维，孙林檀，吕静贤，陈龙，彭渤，潘宝玉. 电力企业互联网舆情数据规格化存储系统设计. 信息技术. 2023(08): 160-164 . 百度学术
3.	张骞. 面向电子数字文献保存的元数据存储系统设计. 电子设计工程. 2022(07): 26-29+36 . 百度学术
4.	庄银霞. 基于网格技术的分布式大数据混合云存储方法. 廊坊师范学院学报(自然科学版). 2021(01): 12-16 . 百度学术
5.	吴尚宇，谢婧雯，王毅. 面向键值存储的日志结构合并树优化技术. 计算机研究与发展. 2020(11): 2432-2441 . 本站查看