ISSN 1000-1239 CN 11-1777/TP

计算机研究与发展 ›› 2019, Vol. 56 ›› Issue (7): 1525-1533.doi: 10.7544/issn1000-1239.2019.20180543

• 信息处理 • 上一篇    下一篇

基于城市监控的自然场景图像的中文文本提取方法

肖珂1,戴舜1,何云华1,孙利民2   

  1. 1(北方工业大学信息学院 北京 100144);2(中国科学院信息工程研究所 北京 100093) (zehan_xiao@163.com)
  • 出版日期: 2019-07-01
  • 基金资助: 
    国家重点研发计划项目(2017YFB0802300);国家自然科学基金项目(61802005);北京市自然科学基金项目(4184085)

Chinese Text Extraction Method of Natural Scene Images Based on City Monitoring

Xiao Ke1, Dai Shun1, He Yunhua1, Sun Limin2   

  1. 1(School of Information Science and Technology, North China University of Technology, Beijing 100144);2(Institute of Information Engineering, Chinese Academy of Sciences, Beijing 100093)
  • Online: 2019-07-01

摘要: 智慧城市的首要任务是城市场景监控及其信息分析,场景图像中文本信息的识别是一种直观且高效的场景信息分析手段,但目前场景图像的中文文本提取由于图像光照和模糊、中文字符结构复杂等因素,未能达到很好的效果.为解决这一问题,提出一种边缘增强的最大稳定极值区域(maximally stable extremal regions, MSER)检测方法,可在光照和模糊影响的条件下提取MSER,通过几何特征约束条件高效地过滤明显的非MSER,得到高质量的候选MSER.之后使用提出的中心聚合方法对分割成多个MSER的候选中文文本域进行中文的聚合,使得候选区域成为单个候选的中文文本分量,再对这些分量进行分析,并运用机器学习选出正确的中文文本.实验结果表明:该算法能够更有效地提取出自然场景图像中的中文文本.

关键词: 文本提取, 最大稳定极值区域, 中文聚合, 支持向量机, 物联网

Abstract: Efficient environment monitoring and information analysis in urban scenes has become one of primary tasks of smart cities. In smart cities, the recognition of text information in scene images, especially the extraction of Chinese text in scene images, is an intuitive and efficient method for analyzing scene information. However, the Chinese text extraction of the current scene images fails to achieve good results because of the uneven illumination and blurred images. In addition, the complexity of Chinese character structure is also an important factor affecting the Chinese text extraction. In order to solve this problem, this paper proposes an edge enhanced maximally stable extremal regions (MSER) detection method, which can extract the MSER under the conditions of illumination and blurring influence, and the non-MSER can be efficiently filtered by geometric feature constraints to obtain high quality candidate MSER. Then the proposed central aggregation is used to aggregate the candidate Chinese text field that has been divided into multiple MSER, so that the candidate region becomes a single candidate Chinese text component, and then these components are analyzed, and finally the correct Chinese text is selected by machine learning. Experiments show that the algorithm can extract Chinese text in natural scene images more effectively.

Key words: text extraction, maximally stable extremal regions (MSER), Chinese aggregation, support vector machine (SVM), Internet of things (IoT)

中图分类号: