DocumentCode :
594715
Title :
Multilingual word spotting in offline handwritten documents
Author :
Wshah, S. ; Kumar, Girish ; Govindaraju, Vengatesan
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. at Buffalo, Buffalo, NY, USA
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
310
Lastpage :
313
Abstract :
In this work, we propose a novel multilingual word spotting framework based on Hidden Markov Models that works on corpus of multilingual handwritten documents and documents that contain more than one handwritten script. The system deals with large multilingual vocabularies without need for word or character segmentation. A keyword is represented by concatenating its character models. We propose and compare two systems: a script identifier based (IDB) and a script identifier free (IDF) system. IDB uses a HMM based script identifier before spotting a keyword. While, IDF does the spotting without the script identification. The system is evaluated on a mixed corpus of public dataset from several scripts such as IAM for English, AMA for Arabic and LAW for Devanagari and on synthetic dataset generated by concatenating words and lines from different scripts in a document image.
Keywords :
document image processing; handwritten character recognition; hidden Markov models; image representation; vocabulary; AMA; Arabic; Devanagari; English; HMM; IAM; LAW; concatenating word; document image; hidden Markov model; keyword representation; multilingual handwritten document; multilingual vocabulary; multilingual word spotting; offline handwritten document; script identifier based sysem; script identifier free system; synthetic dataset; Feature extraction; Hidden Markov models; Image segmentation; Pattern recognition; Testing; Training; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460134
Link To Document :
بازگشت