Title :
A Novel Feature Extraction and Classification Methodology for the Recognition of Historical Documents
Author :
Vamvakas, G. ; Gatos, B. ; Perantoni, S.J.
Author_Institution :
Comput. Intell. Lab., Nat. Centre for Sci. Res. Demokritos, Athens, Greece
Abstract :
In this paper, we present a methodology for off-line character recognition that mainly focuses on handling the difficult cases of historical fonts and styles. The proposed methodology relies on a new feature extraction technique based on recursive subdivisions of the image as well as on calculation of the centre of masses of each sub-image with sub-pixel accuracy. Feature extraction is followed by a hierarchical classification scheme based on the level of granularity of the feature extraction method. Pairs of classes with high values in the confusion matrix are merged at a certain level and higher level granularity features are employed for distinguishing them. Several historical documents were used in order to demonstrate the efficiency of the proposed technique.
Keywords :
document image processing; feature extraction; handwritten character recognition; matrix algebra; optical character recognition; pattern classification; classification methodology; confusion matrix; feature extraction; granularity feature; historical document recognition; optical character recognition; subpixel accuracy; Character recognition; Computational intelligence; Document handling; Feature extraction; Informatics; Laboratories; Optical character recognition software; Support vector machine classification; Support vector machines; Text analysis; Feature Extraction; Historical Documents;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.223