Title :
A Novel Arabic Baseline Estimation Algorithm Based on Sub-Words Treatment
Author :
Boukerma, Hanene ; Farah, Nadir
Author_Institution :
Lab. de Gestion Electron. du Document (LABGED), Univ. 20 Aout 1955, Skikda, Algeria
Abstract :
Baseline detection is an essential preprocessing step for many OCR systems, it has a direct effect on the efficiency and reliability of characters segmentation and features extraction stages, which contribute strongly to yielding higher recognition accuracy. For Arabic handwritten, the conventional methods which extract baseline as straight line are ill-suited because some Arabic words may be contracted from two or more sub-words (PAWs), and the distribution of these sub-words can produce different slant angles within the same word. Focused on the source of the problem, we propose a novel Arabic baseline estimation algorithm in which the PAW level is the real basic block to be processed rather than word level. Experimental results using IFN/ENIT [1] database demonstrate the efficiency of the proposed algorithm.
Keywords :
edge detection; feature extraction; handwritten character recognition; image segmentation; natural languages; optical character recognition; word processing; Arabic handwritten character recognition; OCR system; PAW level; arabic baseline estimation algorithm; baseline detection; character segmentation reliability; feature extraction; optical character recognition; subword treatment; Arabic handwritten; baseline detection; preprocessing; sub-word extraction;
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2010 International Conference on
Conference_Location :
Kolkata
Print_ISBN :
978-1-4244-8353-2
DOI :
10.1109/ICFHR.2010.58