• 中国精品科技期刊
  • CCF推荐A类中文期刊
  • 计算领域高质量科技期刊T1类
Advanced Search
Li Hongjun, Ding Yupeng, Li Chaobo, Zhang Shibing. Action Recognition of Temporal Segment Network Based on Feature Fusion[J]. Journal of Computer Research and Development, 2020, 57(1): 145-158. DOI: 10.7544/issn1000-1239.2020.20190180
Citation: Li Hongjun, Ding Yupeng, Li Chaobo, Zhang Shibing. Action Recognition of Temporal Segment Network Based on Feature Fusion[J]. Journal of Computer Research and Development, 2020, 57(1): 145-158. DOI: 10.7544/issn1000-1239.2020.20190180

Action Recognition of Temporal Segment Network Based on Feature Fusion

Funds: This work was supported by the National Natural Science Foundation of China (61871241), the Ministry of Education Cooperation in Production and Education (201802302115), the Educational Science Research Subject of China Transportation Education Research Association (Jiaotong Education Research 1802-118), the Science and Technology Program of Nantong (JC2018025, JC2018129), the Nanjing University State Key Laboratory for Novel Software Technology (KFKT2019B015), the Postgraduate Research and Practice Innovation Program of Jiangsu Province (KYCX19_2056), and the Nantong University-Nantong Joint Research Center for Intelligent Information Technology (KFKT2017B04).
More Information
  • Published Date: December 31, 2019
  • Action recognition is a research hot topic and a challenging task in the field of computer vision nowadays. Action recognition analysis is closely related to its network input data type, network structure and feature fusion. At present, the main input data of action recognition network is RGB images and optical flow images, and the network structure is mainly based on two-stream and three dimension convolution. While the selection of features directly affects the efficiency of recognition and there are still many problems to be solved in multi-layer feature fusion. In view of the limitation of the RGB images and optical flow images which are the input of the popular two-stream convolution network, using sparse features in low rank space can effectively capture the information characteristics of moving objects in video and supplement the network input data. Meanwhile, for the lack of information interaction in the deep network, the high-level semantic information and the low-level detailed information are combined to recognize actions together, which makes temporal segment network performance more advantageous. Extensive experiments in subjective and objective comparison are performed on UCF101 and HMDB51 and the results show that the proposed algorithm is significantly better than several state-of-the-art algorithms, and the average accuracy rate of the proposed algorithm reaches 97.1% and 76.7%. The experimental results show that our method can effectively improve the recognition rate of action recognition.

Catalog

    Article views (1471) PDF downloads (531) Cited by()
    Turn off MathJax
    Article Contents

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return