ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2017, Vol. 54 ›› Issue (1): 71-79.doi: 10.7544/issn1000-1239.2017.20150707

• 人工智能 • 上一篇    下一篇

网络媒体大数据流异构多模态目标识别策略(201905撤稿)

文孟飞1,4,刘伟荣1,胡超2,3   

  1. 1(中南大学信息科学与工程学院 长沙 410083); 2(中南大学信息与网络中心 长沙 410083); 3(医学信息研究湖南省普通高等学校重点实验室(中南大学) 长沙 410083); 4(湖南省教育科学研究院 长沙 410005) (wmfdcf@126.com)
  • 出版日期: 2017-01-01
  • 基金资助: 
    湖南省教育科学“十二五”规划重点资助项目(XJK014AJC001);国家自然科学基金项目(61379111,61672539,61202342) This work was supported by the Key Project of Educational and Scientific Foundation of Hunan Province During the 12th Five-Year Plan Period(XJK014AJC001) and the National Natural Science Foundation of China (61379111,61672539,61202342).

A Heterogeneous Multimodal Object Recognition Strategy of the Massive Network Data Flow(201905Retraction)

Wen Mengfei1,4, Liu Weirong1, Hu Chao2,3   

  1. 1(School of Information Science and Engineering, Central South University, Changsha 410083); 2(Information and Network Center, Central South University, Changsha 410083); 3(Key Laboratory of Medical Information Research (Central South University), College of Hunan Province, Changsha 410083); 4(Hunan Provincial Research Institute of Education, Changsha 410005)
  • Online: 2017-01-01

摘要: 如何对海量的网络媒体大数据进行准确地目标识别,是当前的一个研究热点和难点.针对此问题提出一种利用媒体流时间相关特性的异构多模态目标识别策略.首先基于媒体流中同时存在音频和视频信息的特征,建立一种异构多模态深度学习结构;结合卷积神经网络(convolutional neural network, CNN)和限制波尔兹曼机(restricted Boltzmann machine, RBM)的算法优点,对音频信息和视频信息分别并行处理,这种异构模式可以充分利用不同深度神经网络的特点;然后生成基于典型关联分析的共享特征表示,并进一步利用时间相关特性进行参数的优化.3种对比实验用来验证所提策略的效果,首先将策略与单一模态算法进行对比;然后再在复合的数据库上建立对比实验;最后在网络视频库上建立对比实验,这些对比实验验证了策略的有效性.

关键词: 目标识别, 深度学习, 卷积神经网络, 限制玻尔兹曼机, 典型关联分析

Abstract: It is a research hot to achieve the object recognition of the massive network media data nowadays. To address the problem, an object recognition strategy is proposed to handle the massive network media data flow which adopts heterogeneous multimodal structure while utilizing the temporal coherence. Firstly, based on the video and audio co-existing feature of media network data, a heterogeneous multimodal structure is constructed to incorporate the convolutional neural network(CNN) and the restricted Boltzmann machine(RBM). The audio information is processed by restricted Boltzmann machine and the video information is processed by convolutional neural network respectively. The heterogeneous multimodal structure can exploit the merits of different deep learning neural networks. After that, the share characteristic representation are generated by using the canonical correlation analysis(CCA). Then the temporal coherence of video frame is utilized to improve the recognizing accuracy further. There kinds of experiments are adopted to validate the effectiveness of the proposed strategy. The first type of experiment compares the proposed strategy with single-mode algorithm. The second type of experiment illustrates the result based on composite database. Finally the videos coming from real websites are extracted to compare the proposed strategy with other algorithms. These experiments prove the effectiveness of the proposed heterogeneous multimodal strategy.

Key words: object recognition, deep learning, convolutional neural network (CNN), restricted Boltzmann machine (RBM), canonical correlation analysis (CCA)

中图分类号: