Abstract:
With the rapid development of deep learning, human pose estimation technology has made remarkable progress in recent years, but the existing methods are still difficult to deal with the common occlusion problem. To address this problem, a human pose estimation method based on keypoint-level occlusion inference is proposed in this paper. Firstly, a baseline human pose estimation network is used to obtain the noisy representation of each keypoint of human body from images with occlusion noises. Then, the occluded keypoints are estimated through the occlusion part prediction module to obtain the visibility vector. The occlusion part prediction module is proposed in this study, which consists of two submodules: occlusion part classification network and visibility encoder. The occlusion part classification network predicts the occlusion state of each keypoint of the human body. Based on the channel attention mechanism, the visibility encoder converts the predicted occlusion state into a set of weight parameters. Finally, the visibility vector and noise features are fused by channel re-weighting method to obtain the keypoint-level occlusion aware features, which are used to calculate the heatmaps of the keypoints. Experimental results on MPII and LSP(leeds sports pose) datasets show that, compared with the baseline human pose estimation network, the proposed method can better deal with the occlusion problem at a small extra computational cost, and achieve better results than existing state-of-the-art methods.