DocumentCode :
3489224
Title :
Holistic Arabic Whole Word Recognition Using HMM and Block-Based DCT
Author :
Krayem, Abdulwahab ; Sherkat, Nasser ; Evett, Lindsay ; Osman, Taha
Author_Institution :
Sch. of Sci. & Technol., Nottingham Trent Univ., Nottingham, UK
fYear :
2013
fDate :
25-28 Aug. 2013
Firstpage :
1120
Lastpage :
1124
Abstract :
A review of the published research confirms that recognition of printed Arabic Word continues to present challenges. This is specially the case when segmentation is problematic. A word level recognition system is presented here that does not rely on any segmentation or require baseline detection of ascenders and descenders. A Discrete Hidden Markov classifier along with a block-based Discrete Cosine Transform (DCT) is used to construct a novel holistic Arabic printed word recognizer. A balanced database of word-image has been constructed to ensure an even distribution of word samples. The Arabic words are typewritten in five fonts having a size 14 points in a plain style. The system is applied on actual scanned word images with no overlap between the training and testing datasets. Word feature vectors are extracted using block-based DCT. A Hidden Markov Models Toolkit (HTK) is used to construct the recogniser. Vector Quantisation is used to map each feature vector to the closest symbol in the codebook. The output of the system is multiple recognition hypotheses (N-best word lattice). The results are encouraging when compared with other published research in this area achieving on average 97.65% accuracy which is significantly higher than previously published results. A detailed comparison and analysis of the results are presented.
Keywords :
discrete cosine transforms; hidden Markov models; image segmentation; natural languages; optical character recognition; vector quantisation; HMM; Holistic Arabic whole word recognition; block-based DCT; block-based discrete cosine transform; discrete hidden Markov classifier; hidden Markov models toolkit; holistic Arabic printed word recognizer; vector quantisation; word level recognition system; Databases; Discrete cosine transforms; Feature extraction; Hidden Markov models; Testing; Training; Vectors; Arabic OCR; Arabic fonts; Block-based DCT; HTK Toolkit; Holistic approach; Vector quantizer;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
ISSN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2013.227
Filename :
6628788
Link To Document :
بازگشت