一种新的利用多帧结合检测视频标题文字的算法

王蓉蓉  金万军  吴立德

一种新的利用多帧结合检测视频标题文字的算法

王蓉蓉金万军吴立德

(复旦大学计算机科学与工程系媒体计算与Web智能实验室上海 200433) (rrwang@fudan.edu.cn)

计量
- 文章访问数: 612
- HTML全文浏览量: 2
- PDF下载量: 758
出版历程
- 发布日期: 2005-07-14

A Novel Video Caption Detection Approach Using Multi-Frame Integration

Wang Rongrong, Jin Wanjun, and Wu Lide

(Media Computing and Web Intelligence Laboratory, Department of Computer Science and Engineering, Fudan University, Shanghai 200433)

摘要

摘要: 视频中的标题文字通常在视频信息索引和检索中起到重要作用.提出了一种新的视频标题文字的检测算法.首先采用一种新的多帧结合技术来降低图像背景的复杂度，它基于时间序列对多帧图像进行最小(或最大)像素值搜索，搜索的具体方式由Sobel边缘图来决定.然后以块为单位来进行文字与非文字的分类，即用一扫描窗口对图像进行扫描，以Sobel边缘为特征，判断其是否为文字.一个2级的金字塔被用来检测不同大小的文字.最后，提出一种新的迭代的文字区域分解方法，它能够更精确地定位文字区域的边界.实验结果表明，这种文字检测算法能够取得很高的精度和召回率.
- 标题文字检测 /
- 视频 /
- 多帧结合(MFI) /
- Sobel边缘 /
- 迭代文字区域分解
Abstract: Captions in videos often play an important role in video information indexing and retrieval. In this paper, a novel video caption detection approach is presented. This approach first applies a new multiple frames integration (MFI) method to reduce the complexity of the background of the image. A time-based minimum (or maximum) pixel value search is employed and a Sobel edge map is used to determine the mode of search. Then block-based text detection is performed, i. e. a small window is used to scan the image and classified as text or non-text, using Sobel edges as features. A two-level pyramid is applied to detect various text sizes. Finally, the approach presents a new iterative text line decomposition method, and accurate text bounding boxes are extracted from the candidate text areas. Experimental results show that the proposed approach achieves a high precision and recall.
- caption detection /
- video /
- multi-frame integration /
- Sobel edge /
- iterative text region decomposition