Abstract:
Object detection technology, as a pivotal component in computer vision, plays a vital role in diverse practical applications. Over decades of evolution, the field has progressed from early methods relying on handcrafted feature extraction to the widespread adoption of deep learning models. Currently, there remains a lack of systematic reviews tracing the developmental trajectory of object detection through improvements in deep learning foundation models. Addressing this gap, this paper organizes the technological evolution around the progression of foundation models in artificial intelligence. We systematically survey detection models built upon various foundation models, compare their strengths and weaknesses, and analyze improvement strategies. The paper also surveys evaluation metrics and technological advancements across different eras, with particular emphasis on how deep learning has driven remarkable performance gains. We discuss persistent challenges in handling diverse scenarios, improving real-time efficiency, and enhancing accuracy. Furthermore, we explore prospective research directions, including model generalization capabilities, computational efficiency, and integration with complex tasks, proposing potential enhancement strategies. This work aims to provide a clear perspective on technological evolution to facilitate further research and applications in object detection.