Title :
Multilingual word spotting in offline handwritten documents
Author :
Wshah, S. ; Kumar, Girish ; Govindaraju, Vengatesan
Author_Institution :
Dept. of Comput. Sci. & Eng., Univ. at Buffalo, Buffalo, NY, USA
Abstract :
In this work, we propose a novel multilingual word spotting framework based on Hidden Markov Models that works on corpus of multilingual handwritten documents and documents that contain more than one handwritten script. The system deals with large multilingual vocabularies without need for word or character segmentation. A keyword is represented by concatenating its character models. We propose and compare two systems: a script identifier based (IDB) and a script identifier free (IDF) system. IDB uses a HMM based script identifier before spotting a keyword. While, IDF does the spotting without the script identification. The system is evaluated on a mixed corpus of public dataset from several scripts such as IAM for English, AMA for Arabic and LAW for Devanagari and on synthetic dataset generated by concatenating words and lines from different scripts in a document image.
Keywords :
document image processing; handwritten character recognition; hidden Markov models; image representation; vocabulary; AMA; Arabic; Devanagari; English; HMM; IAM; LAW; concatenating word; document image; hidden Markov model; keyword representation; multilingual handwritten document; multilingual vocabulary; multilingual word spotting; offline handwritten document; script identifier based sysem; script identifier free system; synthetic dataset; Feature extraction; Hidden Markov models; Image segmentation; Pattern recognition; Testing; Training; Vocabulary;
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4673-2216-4