一种新的利用多帧结合检测视频标题文字的算法

王蓉蓉  金万军  吴立德

一种新的利用多帧结合检测视频标题文字的算法

王蓉蓉金万军吴立德

A Novel Video Caption Detection Approach Using Multi-Frame Integration

Wang Rongrong, Jin Wanjun, and Wu Lide

摘要

摘要: 视频中的标题文字通常在视频信息索引和检索中起到重要作用.提出了一种新的视频标题文字的检测算法.首先采用一种新的多帧结合技术来降低图像背景的复杂度，它基于时间序列对多帧图像进行最小(或最大)像素值搜索，搜索的具体方式由Sobel边缘图来决定.然后以块为单位来进行文字与非文字的分类，即用一扫描窗口对图像进行扫描，以Sobel边缘为特征，判断其是否为文字.一个2级的金字塔被用来检测不同大小的文字.最后，提出一种新的迭代的文字区域分解方法，它能够更精确地定位文字区域的边界.实验结果表明，这种文字检测算法能够取得很高的精度和召回率.

Abstract: Captions in videos often play an important role in video information indexing and retrieval. In this paper, a novel video caption detection approach is presented. This approach first applies a new multiple frames integration (MFI) method to reduce the complexity of the background of the image. A time-based minimum (or maximum) pixel value search is employed and a Sobel edge map is used to determine the mode of search. Then block-based text detection is performed, i. e. a small window is used to scan the image and classified as text or non-text, using Sobel edges as features. A two-level pyramid is applied to detect various text sizes. Finally, the approach presents a new iterative text line decomposition method, and accurate text bounding boxes are extracted from the candidate text areas. Experimental results show that the proposed approach achieves a high precision and recall.

HTML全文

参考文献(0)

施引文献

资源附件(0)