The Semantic Knowledge Embedded Deep Representation Learning and Its Applications on Visual Understanding
-
摘要: 近几年来,随着深度学习技术的日趋完善,传统的计算机视觉任务得到了前所未有的发展.如何将传统视觉研究中的领域知识融入到深度模型中提升深度模型的视觉表达能力,从而应对更为复杂的视觉任务,成为了学术界广泛关注的问题.鉴于此,以融合了语义知识的深度表达学习为主线展开了一系列研究.取得的主要创新成果包括3个方面:1)研究了将单类型的语义信息(类别相似性)融入到深度特征的学习中,提出了嵌入正则化语义关联的深度Hash学习方法,并将其应用于图像的相似性比对与检索问题中,取得了较大的性能提升;2)研究了将多类型信息(多重上下文信息)融入到深度特征的学习中,提出了基于长短期记忆神经网络的场景上下文学习方法,并将其应用于复杂场景的几何属性分析问题中;3)研究了将视觉数据的结构化语义配置融入到深度表达的学习中,提出了融合语法知识的表达学习方法,并将其应用到复杂场景下的通用内容解析问题中.相关的实验结果表明:该方法能有效地对场景的结构化配置进行预测.Abstract: With the rapid development of deep learning technique and large scale visual datasets, the traditional computer vision tasks have achieved unprecedented improvement. In order to handle more and more complex vision tasks, how to integrate the domain knowledge into the deep neural network and enhance the ability of deep model to represent the visual pattern, has become a widely discussed topic in both academia and industry. This thesis engages in exploring effective deep models to combine the semantic knowledge and feature learning. The main contributions can be summarized as follows: 1)We integrate the semantic similarity of visual data into the deep feature learning process, and propose a deep similarity comparison model named bit-scalable deep hashing to address the issue of visual similarity comparison. The model in this thesis has achieved great performance on image searching and people’s identification. 2)We also propose a high-order graph LSTM (HG-LSTM) networks to solve the problem of geometric attribute analysis, which realizes the process of integrating the multi semantic context into the feature learning process. Our extensive experiments show that our model is capable of predicting rich scene geometric attributes and outperforming several state-of-the-art methods by large margins. 3)We integrate the structured semantic information of visual data into the feature learning process, and propose a novel deep architecture to investigate a fundamental problem of scene understanding: how to parse a scene image into a structured configuration. Extensive experiments show that our model is capable of producing meaningful and structured scene configurations, and achieving more favorable scene labeling result on two challenging datasets compared with other state-of-the-art weakly-supervised deep learning methods.
-
Keywords:
- deep learning /
- neural networks /
- semantic embedding /
- scene parsing /
- similarity search
-
-
期刊类型引用(9)
1. 李振华,王泓懿,李洋,林灏,杨昕磊. 大规模复杂终端网络的云原生强化设计. 计算机研究与发展. 2024(01): 2-19 . 本站查看
2. 赵旭康,刘晓锋,徐洁. 融合多样频度与分布差异的Android恶意软件检测. 计算机工程与设计. 2024(02): 390-395 . 百度学术
3. 方加娟,丁乙恒. 基于关联规则的Android恶意软件检测技术. 电脑与信息技术. 2024(03): 115-118 . 百度学术
4. 陈志强,韩萌,武红鑫,李慕航,张喜龙. 分段加权的概念漂移检测方法. 计算机应用. 2023(03): 776-784 . 百度学术
5. 李汇来,杨斌,于秀丽,唐晓梅. 软件缺陷预测模型可解释性对比. 计算机科学. 2023(05): 21-30 . 百度学术
6. 潘建文,崔展齐,林高毅,陈翔,郑丽伟. Android恶意应用的静态检测方法综述. 计算机研究与发展. 2023(08): 1875-1894 . 本站查看
7. 殷建艳. 面向云数据库的Android应用风险评估方法. 信息与电脑(理论版). 2023(17): 177-179 . 百度学术
8. 张皓. 基于深度学习的恶意软件动态检测方法研究. 电子技术与软件工程. 2022(03): 43-46 . 百度学术
9. 刘光源. 基于DoI-RNNs模型的恶意软件动态检测方法. 信息与电脑(理论版). 2022(23): 38-40 . 百度学术
其他类型引用(6)
计量
- 文章访问数: 2252
- HTML全文浏览量: 1
- PDF下载量: 1520
- 被引次数: 15