Title :
OCR error detection and correction of an inflectional Indian language script
Author :
Chaudhuri, B.B. ; Pal, U.
Author_Institution :
Comput. Vision & Pattern Recognition Unit, Indian Stat. Inst., Calcutta, India
Abstract :
This paper deals with an OCR error detection and correction technique for a highly inflectional language script like Bangla (a major Indian language). This is the first report of its kind. Using two separate lexicons of root words and suffixes, candidate root-suffix pairs of each input word are detected, their grammatical agreement are tested and the root/suffix part in which the error has occurred is noted. The correction is made on the corresponding error part of the input string by a fast dictionary access technique. To do so some alternative strings are generated for an erroneous word. Among the alternative strings, those satisfying grammatical agreement in root-suffix and also having smallest Levenstein-Damerau distance are finally chosen as the correct ones. The system has an accuracy of 75.61%
Keywords :
optical character recognition; table lookup; Bangla; Levenstein-Damerau distance; OCR error correction; OCR error detection; candidate root-suffix pairs; fast dictionary access technique; grammatical agreement; inflectional Indian language script; lexicons; root words; Character recognition; Computer errors; Computer vision; Dictionaries; Error correction; Libraries; Optical character recognition software; Optical design; Pattern recognition; Testing;
Conference_Titel :
Pattern Recognition, 1996., Proceedings of the 13th International Conference on
Conference_Location :
Vienna
Print_ISBN :
0-8186-7282-X
DOI :
10.1109/ICPR.1996.546947