DocumentCode :
591974
Title :
Statistical Machine Translation as a Language Model for Handwriting Recognition
Author :
Devlin, John ; Kamali, M. ; Subramanian, Kartick ; Prasad, Ranga ; Natarajan, Prem
Author_Institution :
Raytheon BBN Technol., Cambridge, MA, USA
fYear :
2012
fDate :
18-20 Sept. 2012
Firstpage :
291
Lastpage :
296
Abstract :
When performing handwriting recognition on natural language text, the use of a word-level language model (LM) is known to significantly improve recognition accuracy. The most common type of language model, the n-gram model, decomposes sentences into short, overlapping chunks. In this paper, we propose a new type of language model which we use in addition to the standard n-gram LM. Our new model uses the likelihood score from a statistical machine translation system as a reranking feature. In general terms, we automatically translate each OCR hypothesis into another language, and then create a feature score based on how "difficult" it was to perform the translation. Intuitively, the difficulty of translation correlates with how well-formed the input sentence is. In an Arabic handwriting recognition task, we were able to obtain an 0.4% absolute improvement to word error rate (WER) on top of a powerful 5-gram LM.
Keywords :
handwriting recognition; language translation; natural language processing; Arabic handwriting recognition task; OCR hypothesis; feature score; likelihood score; natural language text; overlapping chunks; reranking feature; standard n-gram model; statistical machine translation system; word error rate; word level language model; Buildings; Computational modeling; Handwriting recognition; Hidden Markov models; Optical character recognition software; Training; Viterbi algorithm; handwriting recognition; machine translation;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2012 International Conference on
Conference_Location :
Bari
Print_ISBN :
978-1-4673-2262-1
Type :
conf
DOI :
10.1109/ICFHR.2012.273
Filename :
6424408
Link To Document :
بازگشت