Title :
MOrpho-LEXical analysis for correcting OCR-generated Arabic words (MOLEX)
Author :
Sari, Toufik ; Sellami, Mokhtar
Author_Institution :
Dept. d´´Inf., Univ. Badji Mokhtar, Annaba, Algeria
Abstract :
In this paper we present a contextual-based method for correcting Arabic words generated by OCR systems. This technique operates as a post-processor and it wants to be universal. It corrects substitution and rejection errors. The Arabic language properties are very useful in morpho-lexical analysis and therefore they are strongly exploited in the development of the method. The substitution errors, the most frequently committed ones by the OCR systems, are rewritten in production rules to be used by a rule-based system for correcting Arabic words. The first version of the developed method operates only at the morpho-lexical level, the extension to the other levels of language analysis is considered in perspectives.
Keywords :
knowledge based systems; optical character recognition; Arabic linguistics; OCR-generated Arabic words correction; contextual-based method; morpholexical analysis; production rules; rejection errors; rule-based system; substitution errors; Acoustics; Dictionaries; Error correction; Heart; Hidden Markov models; Knowledge based systems; Natural language processing; Optical character recognition software; Production systems; Speech recognition;
Conference_Titel :
Frontiers in Handwriting Recognition, 2002. Proceedings. Eighth International Workshop on
Print_ISBN :
0-7695-1692-0
DOI :
10.1109/IWFHR.2002.1030953