ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2015, Vol. 52 ›› Issue (11): 2431-2440.doi: 10.7544/issn1000-1239.2015.20140492

• 人工智能 •    下一篇



  1. 1(长沙师范学院电子与信息工程系 长沙 410100); 2(湖南大学信息科学与工程学院 长沙 410082) (
  • 出版日期: 2015-11-01
  • 基金资助: 

A Hierarchical Model for Joint Object Detection and Pose Estimation

Chen Yaodong1,2, Li Renfa2   

  1. 1(Department of Electronic and Information Engineering, Changsha Normal University, Changsha 410100);2(College of Computer Science and Electronic Engineering, Hunan University, Changsha 410082)
  • Online: 2015-11-01

摘要: 目标检测与姿态估计在当前视觉研究中分属不同的任务,但两者在研究方法和现实应用上具有较强的互补性.提出了一种混合的层次树模型,该模型包含3类结点,分别描述整体目标、判别部件和组件(即语义部件).中间层的判别部件兼顾承上(目标)与启下(组件)的功能,一方面刻画整体目标的局部特征,另一方面隐含多组件的共现信息.相比当前最新的联合模型,层次树模型能够并行化处理检测与估计,避免串联化联合引发的错误传播.采用基于隐变量的结构化支持向量机训练模型,同时提出了一种新的部件学习方法以自动地初始化和优化判别部件.实验设计了多任务识别和单任务识别2种评估场景,对比了本文模型与当前主流的联合识别模型,实验结果说明层次化模型具有更强的识别性能以及更高的时效性.

关键词: 联合识别模型, 姿态估计, 目标检测, 部件模型, 结构化支持向量机, 隐变量

Abstract: Object detection and pose estimation belong to different tasks in computer vision. Viewed from research methods and practical application, there is great complementarity between these two tasks. This paper presents a mixture of hierarchical tree models that consists of three types of nodes, representing the whole object, discriminative parts and components (i.e. semantic parts) respectively. A key point of the model is that the discriminative parts in the middle level characterize not only object features but also mutual information among components. The proposed model can detect articulated objects and estimate their poses in parallel so as to address the error propagation problem that exists in previous joint models. For training the model, we use a latent structured SVM method where the discriminative nodes are viewed as latent variables. A novel learning method is introduced to initialize and optimize the parameters of the discriminative parts automatically. In experiments we design two evaluation scenarios (i.e. multi-task recognition and single-task recognition) to compare the proposed model and obtain the performance with the state-of-the-art joint methods on PASCAL VOC datasets. The results show that the hierarchical model not only outperforms other joint models in both recognition rate, but also has higher time-effectiveness.

Key words: multi-task recognition model, pose estimation, object detection, part-based models, structured SVM, latent variables