Antagonistic Video Generation Method Based on Multimodal Input
-
摘要: 视频生成是计算机视觉和多媒体领域一个重要而又具有挑战性的任务.现有的基于对抗生成网络的视频生成方法通常缺乏一种有效可控的连贯视频生成方式.提出一种新的多模态条件式视频生成模型.该模型使用图片和文本作为输入,通过文本特征编码网络和运动特征解码网络得到视频的运动信息,并结合输入图片生成连贯的运动视频序列.此外,该方法通过对输入图片进行仿射变换来预测视频帧,使得生成模型更加可控、生成结果更加鲁棒.在SBMG(single-digit bouncing MNIST gifs),TBMG(two-digit bouncing MNIST gifs)和KTH(kungliga tekniska hgskolan human actions)数据集上的实验结果表明:相较于现有的视频生成方法,生成结果在目标清晰度和视频连贯性方面都具有更好的效果.另外定性评估和定量评估(SSIM(structural similarity index)与PSNR(peak signal to noise ratio)指标)表明提出的多模态视频帧生成网络在视频生成中起到了关键作用.Abstract: Video generation is an important and challenging task in the field of computer vision and multimedia. The existing video generation methods based on generative adversarial networks (GANs) usually lack an effective scheme to control the coherence of video. The realization of artificial intelligence algorithms that can automatically generate real video is an important indicator of more complete visual appearance information and motion information understanding.A new multi-modal conditional video generation model is proposed in this paper. The model uses pictures and text as input, gets the motion information of video through text feature coding network and motion feature decoding network, and generates video with coherence motion by combining the input images. In addition, the method predicts video frames by affine transformation of input images, which makes the generated model more controllable and the generated results more robust. The experimental results on SBMG (single-digit bouncing MNIST gifs), TBMG(two-digit bouncing MNIST gifs) and KTH(kungliga tekniska hgskolan human actions) datasets show that the proposed method performs better on both the target clarity and the video coherence than existing methods. In addition, qualitative evaluation and quantitative evaluation of SSIM(structural similarity index) and PSNR(peak signal to noise ratio) metrics demonstrate that the proposed multi-modal video frame generation network plays a key role in the generation process.
-
-
期刊类型引用(10)
1. 李墈婧,叶亚峰,张宁. 异构无人平台软件框架的研究综述. 计算机工程与设计. 2024(04): 1272-1281 . 百度学术
2. 全振宇,尹龙祥,陈晓明,韩银和. OODAFlow:面向智能无人系统的流式数据处理框架. 高技术通讯. 2024(09): 905-920 . 百度学术
3. 张雨璇,沙立成,王海霞,王海云,陈茜. 电网调度智能对话机器人的系统架构和关键技术研究. 电子设计工程. 2022(11): 45-49 . 百度学术
4. 钟宏伟,于亮,张耀匀,苏保强,袁珊珊. 基于人工智能技术的流程机器人自动控制系统. 机械制造与自动化. 2022(04): 211-214+228 . 百度学术
5. 伊山,杜静,丁泽柳,巩炳林. 智能无人集群协同作战仿真模型功能与体系设计. 舰船电子工程. 2022(07): 83-87 . 百度学术
6. 吕广喆,任晓瑞,李运喜. 机器人操作系统网络组件关键技术研究. 航空计算技术. 2022(06): 116-118+128 . 百度学术
7. 伊山,马贤明,张锴. 智能无人集群交互信息生成仿真系统架构与功能设计. 指挥控制与仿真. 2021(04): 1-6 . 百度学术
8. 刘祖均,何明,刘锦涛,张乔. 基于逻辑分离的无人机仿真系统设计. 计算机仿真. 2021(09): 64-69 . 百度学术
9. 赖一楠,叶鑫,丁汉. 共融机器人重大研究计划研究进展. 机械工程学报. 2021(23): 1-11+20 . 百度学术
10. 伊山,黄谦,杨鹏飞. 智能无人集群体系作战仿真系统功能与架构设计. 指挥控制与仿真. 2020(05): 65-69 . 百度学术
其他类型引用(7)
计量
- 文章访问数:
- HTML全文浏览量: 0
- PDF下载量:
- 被引次数: 17