Title :
OCR with Word Prediction Technique for Bilingual Documents
Author :
Tangwongsan, Supachai ; Suvacharakulton, Buntida
Author_Institution :
Fac. of Inf. & Commun. Technol., Mahidol Univ., Bangkok, Thailand
fDate :
May 30 2012-June 1 2012
Abstract :
This paper proposes a working model of a bilingual OCR system for printed Thai and English text with word prediction technique. The main idea is that instead of recognizing individual characters from an image block as the conventional approach, it attempts to match the whole word from a list of predictive words based on n-gram trees. The matching process is done in the stage of word verification, in which positive and negative matching are both performed. If there is a match, the system will advance to the next at the end of the word boundary. Obviously, the longer the matched word is, the better the system performance will be. A series of experimental results show better performance in terms of speed improvement at 21% on average, while still being able to maintain the accuracy of recognition as expected.
Keywords :
document image processing; image matching; natural language processing; optical character recognition; string matching; text analysis; trees (mathematics); word processing; English text; Thai text; bilingual OCR system; bilingual documents; image block; individual characters recognition; matching process; n-gram trees; negative matching; positive matching; predictive words; word boundary; word prediction technique; word verification; Accuracy; Character recognition; Dictionaries; Image segmentation; Optical character recognition software; Strips; Testing; bilingual OCR; dictionary look-up; n-gram; word prediction; word verification;
Conference_Titel :
Computer and Information Science (ICIS), 2012 IEEE/ACIS 11th International Conference on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4673-1536-4
DOI :
10.1109/ICIS.2012.77