高级检索
    文孟飞, 刘伟荣, 胡超. 网络媒体大数据流异构多模态目标识别策略(201905撤稿)[J]. 计算机研究与发展, 2017, 54(1): 71-79. DOI: 10.7544/issn1000-1239.2017.20150707
    引用本文: 文孟飞, 刘伟荣, 胡超. 网络媒体大数据流异构多模态目标识别策略(201905撤稿)[J]. 计算机研究与发展, 2017, 54(1): 71-79. DOI: 10.7544/issn1000-1239.2017.20150707
    Wen Mengfei, Liu Weirong, Hu Chao. A Heterogeneous Multimodal Object Recognition Strategy of the Massive Network Data Flow(201905Retraction)[J]. Journal of Computer Research and Development, 2017, 54(1): 71-79. DOI: 10.7544/issn1000-1239.2017.20150707
    Citation: Wen Mengfei, Liu Weirong, Hu Chao. A Heterogeneous Multimodal Object Recognition Strategy of the Massive Network Data Flow(201905Retraction)[J]. Journal of Computer Research and Development, 2017, 54(1): 71-79. DOI: 10.7544/issn1000-1239.2017.20150707

    网络媒体大数据流异构多模态目标识别策略(201905撤稿)

    A Heterogeneous Multimodal Object Recognition Strategy of the Massive Network Data Flow(201905Retraction)

    • 摘要: 如何对海量的网络媒体大数据进行准确地目标识别,是当前的一个研究热点和难点.针对此问题提出一种利用媒体流时间相关特性的异构多模态目标识别策略.首先基于媒体流中同时存在音频和视频信息的特征,建立一种异构多模态深度学习结构;结合卷积神经网络(convolutional neural network, CNN)和限制波尔兹曼机(restricted Boltzmann machine, RBM)的算法优点,对音频信息和视频信息分别并行处理,这种异构模式可以充分利用不同深度神经网络的特点;然后生成基于典型关联分析的共享特征表示,并进一步利用时间相关特性进行参数的优化.3种对比实验用来验证所提策略的效果,首先将策略与单一模态算法进行对比;然后再在复合的数据库上建立对比实验;最后在网络视频库上建立对比实验,这些对比实验验证了策略的有效性.

       

      Abstract: It is a research hot to achieve the object recognition of the massive network media data nowadays. To address the problem, an object recognition strategy is proposed to handle the massive network media data flow which adopts heterogeneous multimodal structure while utilizing the temporal coherence. Firstly, based on the video and audio co-existing feature of media network data, a heterogeneous multimodal structure is constructed to incorporate the convolutional neural network(CNN) and the restricted Boltzmann machine(RBM). The audio information is processed by restricted Boltzmann machine and the video information is processed by convolutional neural network respectively. The heterogeneous multimodal structure can exploit the merits of different deep learning neural networks. After that, the share characteristic representation are generated by using the canonical correlation analysis(CCA). Then the temporal coherence of video frame is utilized to improve the recognizing accuracy further. There kinds of experiments are adopted to validate the effectiveness of the proposed strategy. The first type of experiment compares the proposed strategy with single-mode algorithm. The second type of experiment illustrates the result based on composite database. Finally the videos coming from real websites are extracted to compare the proposed strategy with other algorithms. These experiments prove the effectiveness of the proposed heterogeneous multimodal strategy.

       

    /

    返回文章
    返回