Abstract:
In crowd counting tasks, the high cost of data annotation severely limits the broad application of fully supervised methods. To significantly reduce reliance on labeled data, semi-supervised counting methods leveraging large amounts of unlabeled data have emerged as the mainstream research direction. However, existing semi-supervised approaches typically rely on iteratively generating pseudo-labels for training, and their performance is significantly constrained by two types of uncertainty factors: 1) Epistemic uncertainty (model’s inherent lack of knowledge), leading to unstable pseudo-label generation quality; and 2) Aleatoric uncertainty (inherent noise and ambiguity in the data), making the model susceptible to background interference and distribution bias. To address these challenges, we propose the progressive cognition-guided dual-domain network for semi-supervised crowd counting (PCDNet). It designs a progressive cognition-guided pseudo-label refinement mechanism which employs multi-granularity filtering to effectively eliminate low-quality pseudo-labels and significantly mitigate epistemic uncertainty. Furthermore, it introduces a spatial-frequency dual-domain joint counting loss. By enforcing the model to learn consistent and robust feature representations in both the spatial and frequency domains, this loss effectively constrains aleatoric uncertainty arising from data bias, enhancing the model’s generalization capability for complex scenes. Extensive experiments on four mainstream crowd counting datasets demonstrate that PCDNet significantly outperforms existing semi-supervised methods, with advantages being particularly pronounced when labeled data is scarce. The results fully validate the effectiveness of the proposed method in addressing epistemic and aleatoric uncertainty, generating high-quality pseudo-labels, and enhancing model robustness and generalization performance, providing a superior solution for semi-supervised crowd counting.