DocumentCode :
3341355
Title :
End-to-End Trainable Thai OCR System Using Hidden Markov Models
Author :
Krstovski, Kriste ; Macrostie, Ehry ; Prasad, Rohit ; Natarajan, Premkumar
Author_Institution :
BBN Technol., Cambridge, MA
fYear :
2008
fDate :
16-19 Sept. 2008
Firstpage :
607
Lastpage :
614
Abstract :
In this paper we present an end-to-end trainable optical character recognition (OCR) system for recognizing machine-printed text in Thai documents. The end-to-end OCR system is based on a script-independent methodology using hidden Markov models. Our system provides an integrated workflow beginning with annotation and transcription of training images to performing OCR on new images with models trained on transcribed training images. The efficacy of our end-to-end OCR system is demonstrated by rapidly configuring our OCR engine for the Thai script. We present experimental results on Thai documents to highlight the specific challenges posed by the Thai script and analyze the recognition performance as a function of amount of training data.
Keywords :
document handling; hidden Markov models; optical character recognition; text analysis; end-to-end trainable Thai OCR system; hidden Markov models; machine-printed text; optical character recognition; script-independent methodology; Character recognition; Engines; Gaussian processes; Hidden Markov models; Humans; Optical character recognition software; Performance analysis; Probability; Text recognition; Training data; HHM; OCR; Thai; Thai script; annotation; document images; end-to-end; integrated workflow; script-independent; training; transcription;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
Type :
conf
DOI :
10.1109/DAS.2008.76
Filename :
4670012
Link To Document :
بازگشت