• DocumentCode
    638340
  • Title

    A simple text/graphic separation method for document image segmentation

  • Author

    Zirari, F. ; Ennaji, Abdellatif ; Nicolas, S. ; Mammass, D.

  • Author_Institution
    LITIS Lab., Univ. of Rouen, Rouen, France
  • fYear
    2013
  • fDate
    27-30 May 2013
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Page segmentation into text and non-text elements is an essential preprocessing step before optical character recognition (OCR) operation. In case of poor segmentation, an OCR classification engine produces garbage characters due to the presence of non-text elements. This paper presents a method to separate the textual and non textual components in document images using a graph-based modeling and structural analysis. This is a fast and efficient method to separate adequately the graphical and the textual parts of a document. We have evaluated our method on two well-known subsets: the UW-III dataset and the ICDAR 2009 page segmentation competition dataset. Comparisons are led with two methods of state-of-the-art; these results showing that our method proved better performances in this task.
  • Keywords
    document image processing; graph theory; image segmentation; optical character recognition; text analysis; ICDAR 2009 page segmentation competition dataset; OCR classification engine; UW-Ill dataset; document image segmentation; garbage characters; graph-based modeling; nontext elements; optical character recognition operation; page segmentation; structural analysis; text elements; text-graphic separation method; Accuracy; Educational institutions; Histograms; Image edge detection; Image segmentation; Text categorization; connected components; document image; graph; structural analysis; text/non-text separating;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Systems and Applications (AICCSA), 2013 ACS International Conference on
  • Conference_Location
    Ifrane
  • ISSN
    2161-5322
  • Type

    conf

  • DOI
    10.1109/AICCSA.2013.6616493
  • Filename
    6616493