Title :
Natural Language Morphology Integration in Off-Line Arabic Optical Text Recognition
Author :
Kanoun, Slim ; Alimi, Adel M. ; Lecourtier, Yves
Author_Institution :
Res. Group on Intell. Machines (REGIM), Univ. of Sfax, Sfax, Tunisia
fDate :
4/1/2011 12:00:00 AM
Abstract :
In this paper, we propose a new linguistic-based approach called the affixal approach for Arabic word and text image recognition. Most of the existing works in the field integrate the knowledge of the Arabic language in the recognition process in two ways: either in post-recognition using the language of dictionary (dictionary of words) to validate the word hypotheses suggested by the OCR or in the course of the recognition process (recognition directed by a lexicon) using a statistical model of the language (Hidden Markov Model or N-gram). The proposed approach uses the linguistic concepts of the vocabulary to direct and simplify the recognition process. The principal contribution of the proposed approach is to be able to categorize the word hypotheses in words that are either derived or not derived from roots and to characterize morphologically each word hypothesis in order to prepare the text hypotheses for later analyses (for example, syntactic analysis; to filter the sentence hypotheses).
Keywords :
dictionaries; hidden Markov models; linguistics; natural language processing; optical character recognition; statistical analysis; text analysis; word processing; Arabic language; Hidden Markov model; N-gram model; OCR; affixal approach; dictionary; linguistic based approach; natural language morphology integration; offline Arabic optical text image recognition; statistical model; Dictionaries; Hidden Markov models; Pragmatics; Semantics; Shape; Text recognition; Vocabulary; Arabic text image; linguistic concepts of Arabic vocabulary; morphological characterization of word; off-line recognition; word categorization; Algorithms; Artificial Intelligence; Automatic Data Processing; Image Enhancement; Image Interpretation, Computer-Assisted; Information Storage and Retrieval; Natural Language Processing; Pattern Recognition, Automated;
Journal_Title :
Systems, Man, and Cybernetics, Part B: Cybernetics, IEEE Transactions on
DOI :
10.1109/TSMCB.2010.2072990