Title :
Document image analysis with OCRopus
Author_Institution :
German Res. Center for Artificial Intell. (DFKI GmbH), Kaiserslautern, Germany
Abstract :
Document image analysis is the field of converting paper documents into an editable electronic representation by performing optical character recognition (OCR). In recent years, there has been a tremendous amount of progress in the development of open source OCR systems. OCRopus is one of the leading open source document analysis system with a modular and pluggable architecture. This paper presents an overview of different steps involved in a document image analysis system and illustrates them with examples from OCRopus.
Keywords :
document image processing; optical character recognition; public domain software; software architecture; OCRopus; document image analysis; electronic representation; modular-pluggable architecture; open source document analysis system; optical character recognition; paper documents; Artificial intelligence; Books; Character recognition; Data structures; Image analysis; Image converters; Open source software; Optical character recognition software; Search engines; Text analysis;
Conference_Titel :
Multitopic Conference, 2009. INMIC 2009. IEEE 13th International
Conference_Location :
Islamabad
Print_ISBN :
978-1-4244-4872-2
Electronic_ISBN :
978-1-4244-4873-9
DOI :
10.1109/INMIC.2009.5383078