ISSN 1000-1239 CN 11-1777/TP

• 论文 • 上一篇    下一篇

表格型票据中框线检测与去除算法

张 艳1 郁生阳2 张重阳1 杨静宇1   

  1. 1(南京理工大学计算机科学与技术学院 南京 210094) 2(上海交通大学图像处理与模式识别研究所 上海 200240) (zhangy@njust.edu.cn)
  • 出版日期: 2008-05-15

Extraction and Removal of Frame Line in Form Bill

Zhang Yan1, Yu Shengyang2, Zhang Chongyang1 , and Yang Jingyu1   

  1. 1(School of Computer Science and Technology, Nanjing University of Science & Technology, Nanjing 210094) 2(Institute of Image Processing & Pattern Recognition, Shanghai Jiao Tong University, Shanghai 200240)
  • Online: 2008-05-15

摘要: 字符笔画与表格线的粘连或交叠是表格型票据中普遍存在的现象,严重影响了后期票据自动识别处理的性能.现有方法大多基于二值图像,未能充分利用灰度图中的框线特征.基于票据图像中的框线特征,提出一种表格型票据预处理中的框线检测与去除算法,首先充分利用票据灰度图像的特点准确地检测出框线,再采用一种连通链结构描述叠加后的框线区域,然后对交叠进行判断和标记,根据标记保留字符笔划去除框线干扰.经过实际银行支票图像测试证明了算法的有效性和鲁棒性.

关键词: 文档分析, 表格识别, 直线检测, 连通链结构, 框线去除

Abstract: In practical form bill images, characters usually overlap with the form frames, which will greatly affect the performance of the document image autoprocessing system. Most of the form frame line removal algorithms are based on binary images, which can not make good use of line characteristics in gray images. According to the attribute of financial documents’ structure, an improved line detection and removal algorithm applied in financial form image preprocessing is proposed in this paper. In order to reduce the complexity and improve the effect of line removal, the process of line detection and removal are carried out respectively. First, frame lines are exactly detected according to the line characteristics in gray images. Then chain code method is used to describe the frame line region. Crosspoints of characters and lines are detected subsequently with deterministic finite automaton in order to analyse the overlapping types. Finally, frame lines are removed with the marks in crosspoints detection. Therefore, the limitation of stroke aberrance caused by thresholding is overcome and higher accuracy of line removal can be achieved. The results of experiment demonstrate that compared with different existing methods based on handwritten digit character recognition, the proposed algorithm is efficient and robust.

Key words: document analysis, form recognition, line detection, chain code, frame line removal