• DocumentCode
    3695082
  • Title

    Text line extraction in document images

  • Author

    Liuan Wang;Wei Fan;Jun Sun;Satshi Naoi;Tanaka Hiroshi

  • Author_Institution
    Fujitsu Research &
  • fYear
    2015
  • Firstpage
    191
  • Lastpage
    195
  • Abstract
    Text line extraction in document images is an important prerequisite for many content based image understanding applications. In this paper, we propose an accurate and robust method for generic text line extraction, which can be applied on large categories of document images, diverse languages, and text lines with different orientations. Firstly, the candidate connected components are extracted from document image using Maximal Stable Extremal Region (MSER) with the noises filtered by Adaboost and Convolution Neural Network (CNN). Then, the coarse text lines are generated from hierarchical edges reconstruction and cut by local linearity of text lines in the document spanning tree. Finally, for accurate text line extraction, the cut multi-components are re-connected based on text line energy minimization in terms of text line consistency and the fitting error. Experimental results on multilingual test dataset demonstrate the effectiveness and robust of the proposed method, which yields higher performance compared with state-of-the-art methods.
  • Keywords
    "Robustness","Benchmark testing","Image segmentation","Surveillance","Image recognition","Integrated optics","Optical imaging"
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
  • Type

    conf

  • DOI
    10.1109/ICDAR.2015.7333750
  • Filename
    7333750