DocumentCode :
2482229
Title :
Improvements in hidden Markov model based Arabic OCR
Author :
Prasad, Rohit ; Saleem, Shirin ; Kamali, Matin ; Meermeier, Ralf ; Natarajan, Prem
Author_Institution :
BBN Technol., Cambridge, MA
fYear :
2008
fDate :
8-11 Dec. 2008
Firstpage :
1
Lastpage :
4
Abstract :
This paper describes recent advances in hidden Markov model (HMM) based OCR for machine-printed arabic documents. A combination of script-independent and script-specific techniques are applied to glyph models and language models (LM). Script-independent techniques we applied are higher order n-gram LMs for N-best rescoring and discriminative estimation of glyph HMMs. Arabic specific techniques include the use of context-dependent HMMs for glyph modeling and Parts-of-Arabic-Words in language modeling. We present experimental results that demonstrate a 40% relative reduction in word error rate over the baseline configuration on a corpus of machine-printed Arabic documents.
Keywords :
document image processing; hidden Markov models; optical character recognition; Arabic OCR; hidden Markov model; language model; machine-printed Arabic document; script-independent technique; Character generation; Character recognition; Context modeling; Error analysis; Feature extraction; Hidden Markov models; Lattices; Optical character recognition software; Shape; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition, 2008. ICPR 2008. 19th International Conference on
Conference_Location :
Tampa, FL
ISSN :
1051-4651
Print_ISBN :
978-1-4244-2174-9
Electronic_ISBN :
1051-4651
Type :
conf
DOI :
10.1109/ICPR.2008.4761446
Filename :
4761446
Link To Document :
بازگشت