Title :
Sparse descriptor for lexicon reduction in handwritten Arabic documents
Author :
Chherawala, Youssouf ; Wisnovsky, R. ; Cheriet, Mohamed
Author_Institution :
Synchromedia Lab., Ecole de Technol. Super., Montreal, QC, Canada
Abstract :
Arabic words have a rich structure. They are made of subwords (groups of connected letters) and diacritical marks (dots). This paper proposes a sparse descriptor specifically designed for lexicon reduction in handwritten Arabic documents. The topological and geometrical features of subwords are extracted from the skeleton image, based on the concept of local density. The sparse descriptor is then formed as a 3-bins histogram, describing the distribution of the skeleton pixels´ local density (low, medium or high). This descriptor is then extended to the Arabic word descriptor (AWD), which combines information from all the subwords and diacritics of an Arabic word. This approach is easy to implement and has only one free parameter. It has been evaluated on the Ibn Sina and IFN/ENIT databases with promising results.
Keywords :
document image processing; feature extraction; handwritten character recognition; natural language processing; visual databases; word processing; 3-bins histogram; AWD; Arabic word descriptor; IFN/ENIT database; Ibn Sina database; diacritical marks; geometrical feature extraction; handwritten Arabic documents; lexicon reduction; skeleton image; skeleton pixel local density distribution; sparse descriptor; subwords; topological feature extraction; Databases; Feature extraction; Geometry; Histograms; Shape; Skeleton; Topology;
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
Print_ISBN :
978-1-4673-2216-4