ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (12): 2802-2812.doi: 10.7544/issn1000-1239.2015.20140553

• 人工智能 • 上一篇    下一篇



  1. (西安电子科技大学电子工程学院 西安 710071) (
  • 出版日期: 2015-12-01
  • 基金资助: 

A New Framework of Action Recognition: 3DHOGTCC and 3DHOOFG

Tong Ming, Wang Fan, Wang Shuo, Ji Chenglong   

  1. (School of Electronic Engineering, Xidian University, Xi’an 710071)
  • Online: 2015-12-01

摘要: 行为识别在语义分析领域具有很高的学术研究价值和广泛的市场应用前景.为了实现对视频行为的准确描述, 提出了2类构建稠密轨迹运动描述子的方法.1)通过光流约束和聚类,实现对运动区域的稠密采样,以获取行为的局部位置信息;2)选取目标运动角点为特征点,通过对特征点的跟踪获取运动轨迹;3)在以轨迹为中心的视频立方体内,分别构建三维梯度方向直方图(3D histograms of oriented gradients in trajectory centered cube, 3DHOGTCC)描述子和三维光流梯度方向直方图(3D histograms of oriented optical flow gradients, 3DHOOFG)描述子,用以对运动的局部信息进行准确描述.为了充分利用行为发生的场景信息,提出了一种融合动态描述子和静态描述子的行为识别新框架,使得动态特征与静态特征相互融合支撑,即使在摄像头运动等复杂场景下,亦能取得较好的识别效果.在Weizmann和UCF-Sports数据库采用留一交叉验证,在KTH和Youtube数据库采用4折交叉验证.实验证明了提出新框架的有效性.

关键词: 特征提取, 行为识别, 稠密轨迹, 光流, 运动描述子

Abstract: Video action recognition has a high academic research value and wide market application prospect in the field of video semantic analysis. In order to achieve an accurate description of video action, two motion descriptors based on dense trajectories are proposed in this paper. Firstly, to capture the local motion information of the action, dense sampling in motion object region is done by constraining and clustering of optical flow. Secondly, the corners of the motion target have been selected as the feature points which are tracked to obtain dense trajectories. Finally, 3D histograms of oriented gradients in trajectory centered cube (3DHOGTCC) descriptor and 3D histograms of oriented optical flow gradients (3DHOOFG) descriptor are constructed separately in the video cube centered on the trajectories to describe the local area of motion accurately. To make full use of the scene information that action occurs, a framework combined with motion descriptors and static descriptors is proposed in this paper, which makes the dynamic characteristics and static background features fusion and supplement mutually and also achieves better recognition accuracy even in complex scenes such as camera movement, etc. This paper adopts the leave one out cross validation on the datasets of Weizmann and UCF-Sports, and adopts the four-fold cross validation on the datasets of KTH and Youtube, and the experiments show the effectiveness of the new framework.

Key words: feature extraction, action recognition, dense trajectory, optical flow, motion descriptor