Title :
A comparative study between methods of Arabic baseline detection
Author :
AL-Shatnawi, Atallah ; Omar, Khairuddin
Author_Institution :
Dept. of Syst. Sci. & Manage., Univ. Kebangsaan Malaysia, Bangi, Malaysia
Abstract :
Preprocessing is the most important stage in the Arabic OCR system; it has a direct effect on the reliability and efficiency of the segmentation and feature extraction stages. It is worth mentioning that Arabic language is cursively written, and its characters have between two to four shapes. An Arabic word likely consists of two or more characters which are connected through an imaginary line called baseline. Detecting baseline is one of the main majorities in preprocessing Arabic OCR system. The baseline can be used for both skew normalization and character segmentation. In this paper the challenges of the Arabic baseline detection methods are listed and clarified. Also this paper aims to provide a brief comparison between the methods of Arabic baseline detection. The comparison has been done based on each of the natures of the Arabic language written, and the diacritics, such as dots and zigzag, and the word slop, and the subwords found.
Keywords :
handwriting recognition; image segmentation; natural language processing; optical character recognition; Arabic OCR system; Arabic baseline detection; Arabic language; Arabic word; character; cursively written; feature extraction stages; imaginary line; skew normalization; Conference management; Feature extraction; Image edge detection; Image segmentation; Informatics; Natural languages; Optical character recognition software; Pattern recognition; Shape; Writing; Arabic; Baseline; Contour; Handwritten; Horizontal Projection; OCR; Offline; Preprocessing; Principle Component Analysis; Skeleton;
Conference_Titel :
Electrical Engineering and Informatics, 2009. ICEEI '09. International Conference on
Conference_Location :
Selangor
Print_ISBN :
978-1-4244-4913-2
DOI :
10.1109/ICEEI.2009.5254814