DocumentCode :
1633493
Title :
Italic or Roman: Word Style Recognition without A Priori Knowledge for Old Printed Documents
Author :
Eynard, Loris ; Emptoz, Hubert
Author_Institution :
CNRS INSA-Lyon, Univ. de Lyon, Lyon, France
fYear :
2009
Firstpage :
823
Lastpage :
827
Abstract :
This paper presents an Italic/Roman word type recognition system without a priori knowledge on the characters´ font. This method aims at analyzing old documents in which character segmentation is not trivial. Therefore our approach segments the document into words and analyse the text word per word. To define the word style, we combine three criteria which are based on the visual differences between a word and a slanted version of the same word.These criteria are defined thanks to features computed from the vertical projection profile of the word. Because we do not assume a specific slant angle, we compute these measures on a whole range of possible slant angles and then sum the obtained scores. Our results show a ratio of 100% recognition for Italic words and 97.2% for Roman words.
Keywords :
document handling; pattern recognition; text analysis; Italic-Roman word type recognition; document segmentation; old printed document; slant angle; text analysis; word style recognition; word vertical projection profile; Character recognition; Feature extraction; Histograms; Humans; Image segmentation; Ink; Optical character recognition software; Text analysis; Text recognition; Typesetting; Italic Recognition; old documents; segmentation-free; word style;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
ISSN :
1520-5363
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2009.176
Filename :
5277521
Link To Document :
بازگشت