Title :
Progress of combining trigram and Winnow in Thai OCR error correction
Author :
Meknavin, Surapant ; Kijsirikul, Boonserm ; Chotimongkol, Ananlada ; Nuttee, Cholwich
Author_Institution :
Nat. Electron. & Comput. Technol. Center, Bangkok, Thailand
Abstract :
From specific characteristics of Thai, Thai OCR errors frequently depend on nearby characters. To capture this characteristic of Thai OCR errors more appropriately, we propose the idea of using the varied n-gram of the character confusion probability for scoring approximately matched words. The value of n depends on characteristics of each character. For languages which have no explicit word boundary, word boundary ambiguity has to be resolved before correcting errors. In this paper, a maximal matching algorithm is used instead of a more complicated word segmentation algorithm to reduce a time complexity problem. Finally, a hybrid method which combines a part-of-speech trigram model with Winnow algorithm is used to selected the most probable correction
Keywords :
computational complexity; error correction; optical character recognition; probability; OCR; Thai; Winnow; character confusion probability; error correction; hybrid method; maximal matching algorithm; time complexity problem; trigram; varied n-gram; word boundary ambiguity; Character recognition; Computer errors; Error correction; Natural languages; Optical character recognition software;
Conference_Titel :
Circuits and Systems, 1998. IEEE APCCAS 1998. The 1998 IEEE Asia-Pacific Conference on
Conference_Location :
Chiangmai
Print_ISBN :
0-7803-5146-0
DOI :
10.1109/APCCAS.1998.743880