Abstract:
Since the pedestrian images taken by the monitoring equipment in natural scenes are always occluded by various obstacles, occlusions is a great challenge for person re-identification. For the above problems, a spatial attention and pose estimation (SAPE) is proposed. In order to give consideration to both global and local features, a multi-task network is constructed to realize multi-granularity representation of features. By means of spatial attention mechanism, the region of interest is directed to the spatial semantic information in the image, and the visual knowledge which is helpful for re-identification is mined from the global structural pattern. Then, combined with the idea of part matching, the feature map extracted from the residual network is evenly divided into several parts horizontally, and the identification granularity is increased by matching the local features. On this basis, the key information of pedestrians in the image extracted by the improved pose estimator is fused with the feature map extracted by the convolutional neural network, and the threshold is set to remove the occlusion area, and the features with strong identification are obtained, so as to eliminate the influence of occlusion on the re-identification results. We verify the effectiveness of the SAPE model on three datasets of Occluded-DukeMTMC, Occluded-REID and Partial-REID. The experimental results show that SAPE has achieved good experimental results.