Abstract:
Video action recognition has a high academic research value and wide market application prospect in the field of video semantic analysis. In order to achieve an accurate description of video action, two motion descriptors based on dense trajectories are proposed in this paper. Firstly, to capture the local motion information of the action, dense sampling in motion object region is done by constraining and clustering of optical flow. Secondly, the corners of the motion target have been selected as the feature points which are tracked to obtain dense trajectories. Finally, 3D histograms of oriented gradients in trajectory centered cube (3DHOGTCC) descriptor and 3D histograms of oriented optical flow gradients (3DHOOFG) descriptor are constructed separately in the video cube centered on the trajectories to describe the local area of motion accurately. To make full use of the scene information that action occurs, a framework combined with motion descriptors and static descriptors is proposed in this paper, which makes the dynamic characteristics and static background features fusion and supplement mutually and also achieves better recognition accuracy even in complex scenes such as camera movement, etc. This paper adopts the leave one out cross validation on the datasets of Weizmann and UCF-Sports, and adopts the four-fold cross validation on the datasets of KTH and Youtube, and the experiments show the effectiveness of the new framework.