A Novel Video Caption Detection Approach Using Multi-Frame Integration

Wang Rongrong, Jin Wanjun, and Wu Lide   

  1. (Media Computing and Web Intelligence Laboratory, Department of Computer Science and Engineering, Fudan University, Shanghai 200433)
  • Online:2005-07-15

Abstract: Captions in videos often play an important role in video information indexing and retrieval. In this paper, a novel video caption detection approach is presented. This approach first applies a new multiple frames integration (MFI) method to reduce the complexity of the background of the image. A time-based minimum (or maximum) pixel value search is employed and a Sobel edge map is used to determine the mode of search. Then block-based text detection is performed, i. e. a small window is used to scan the image and classified as text or non-text, using Sobel edges as features. A two-level pyramid is applied to detect various text sizes. Finally, the approach presents a new iterative text line decomposition method, and accurate text bounding boxes are extracted from the candidate text areas. Experimental results show that the proposed approach achieves a high precision and recall.

Key words: caption detection, video, multi-frame integration, Sobel edge, iterative text region decomposition