Abstract:
Superimposed texts bring important semantic clues for video indexing and retrieval. Texts in videos often span tens or even hundreds of frames and many researchers have exploited the temporal redundancy of video text to improve the text detection accuracy and the text region quality. Described in this paper is a novel approach to track and segment static superimposed texts by utilizing multiple video frame information. For text detection, multiple frames are used to verify the appearance of the text regions which have been detected on a single frame. A binary-search based text tracking method is proposed, which can track the static text object efficiently by utilizing the features of the edge bit map. In order to refine the text regions, text detection is performed again on a synthesized image, which is produced by minimum/maximum pixel search on consecutive tracked frames. In text segmentation, edge features are exploited to further remove complex background in addition to traditional gray-value integration. Experimental results show the effectiveness of the proposed method.