• DocumentCode
    3144942
  • Title

    Document image analysis with OCRopus

  • Author

    Shafait, Faisal

  • Author_Institution
    German Res. Center for Artificial Intell. (DFKI GmbH), Kaiserslautern, Germany
  • fYear
    2009
  • fDate
    14-15 Dec. 2009
  • Firstpage
    1
  • Lastpage
    6
  • Abstract
    Document image analysis is the field of converting paper documents into an editable electronic representation by performing optical character recognition (OCR). In recent years, there has been a tremendous amount of progress in the development of open source OCR systems. OCRopus is one of the leading open source document analysis system with a modular and pluggable architecture. This paper presents an overview of different steps involved in a document image analysis system and illustrates them with examples from OCRopus.
  • Keywords
    document image processing; optical character recognition; public domain software; software architecture; OCRopus; document image analysis; electronic representation; modular-pluggable architecture; open source document analysis system; optical character recognition; paper documents; Artificial intelligence; Books; Character recognition; Data structures; Image analysis; Image converters; Open source software; Optical character recognition software; Search engines; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Multitopic Conference, 2009. INMIC 2009. IEEE 13th International
  • Conference_Location
    Islamabad
  • Print_ISBN
    978-1-4244-4872-2
  • Electronic_ISBN
    978-1-4244-4873-9
  • Type

    conf

  • DOI
    10.1109/INMIC.2009.5383078
  • Filename
    5383078