Title :
Greek Polytonic OCR Based on Efficient Character Class Number Reduction
Author :
Gatos, B. ; Louloudis, G. ; Stamatopoulos, N.
Author_Institution :
Comput. Intell. Lab., Nat. Res. Center Demokritos, Athens, Greece
Abstract :
Recognition of document images having Greek polytonic (multi accent) characters is a challenging task due the large number of existing character classes (more than 270). In this paper, we propose a novel OCR framework for the recognition of machine-printed Greek polytonic documents that is based on combining five different recognition modules in order to have a small number of classes (around 30) in each module. One recognition module is used for accent recognition while four recognition modules are used for the recognition of characters belonging to different horizontal text zones. The proposed system also includes the following stages: (a) pre-processing, (b) text dewarping, text line and text baseline detection, (c) accent and character detection and (d) combination of accent and character recognition results. Extended experiments have been conducted in order to record the performance of the proposed OCR system, of all involved recognition modules as well as of the accent detection stage.
Keywords :
document image processing; optical character recognition; text analysis; Greek polytonic OCR system; character class number reduction; character detection; character recognition; horizontal text zone; machine printed Greek polytonic document image recognition module; text baseline detection; text dewarping; Accuracy; Character recognition; Image segmentation; Measurement; Optical character recognition software; Text recognition; Class number reduction; Greek polytonic characters; OCR; Word baseline detection;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2011.233