Abstract:
It is a research hot to achieve the object recognition of the massive network media data nowadays. To address the problem, an object recognition strategy is proposed to handle the massive network media data flow which adopts heterogeneous multimodal structure while utilizing the temporal coherence. Firstly, based on the video and audio co-existing feature of media network data, a heterogeneous multimodal structure is constructed to incorporate the convolutional neural network(CNN) and the restricted Boltzmann machine(RBM). The audio information is processed by restricted Boltzmann machine and the video information is processed by convolutional neural network respectively. The heterogeneous multimodal structure can exploit the merits of different deep learning neural networks. After that, the share characteristic representation are generated by using the canonical correlation analysis(CCA). Then the temporal coherence of video frame is utilized to improve the recognizing accuracy further. There kinds of experiments are adopted to validate the effectiveness of the proposed strategy. The first type of experiment compares the proposed strategy with single-mode algorithm. The second type of experiment illustrates the result based on composite database. Finally the videos coming from real websites are extracted to compare the proposed strategy with other algorithms. These experiments prove the effectiveness of the proposed heterogeneous multimodal strategy.