DocumentCode :
595532
Title :
Sparse descriptor for lexicon reduction in handwritten Arabic documents
Author :
Chherawala, Youssouf ; Wisnovsky, R. ; Cheriet, Mohamed
Author_Institution :
Synchromedia Lab., Ecole de Technol. Super., Montreal, QC, Canada
fYear :
2012
fDate :
11-15 Nov. 2012
Firstpage :
3729
Lastpage :
3732
Abstract :
Arabic words have a rich structure. They are made of subwords (groups of connected letters) and diacritical marks (dots). This paper proposes a sparse descriptor specifically designed for lexicon reduction in handwritten Arabic documents. The topological and geometrical features of subwords are extracted from the skeleton image, based on the concept of local density. The sparse descriptor is then formed as a 3-bins histogram, describing the distribution of the skeleton pixels´ local density (low, medium or high). This descriptor is then extended to the Arabic word descriptor (AWD), which combines information from all the subwords and diacritics of an Arabic word. This approach is easy to implement and has only one free parameter. It has been evaluated on the Ibn Sina and IFN/ENIT databases with promising results.
Keywords :
document image processing; feature extraction; handwritten character recognition; natural language processing; visual databases; word processing; 3-bins histogram; AWD; Arabic word descriptor; IFN/ENIT database; Ibn Sina database; diacritical marks; geometrical feature extraction; handwritten Arabic documents; lexicon reduction; skeleton image; skeleton pixel local density distribution; sparse descriptor; subwords; topological feature extraction; Databases; Feature extraction; Geometry; Histograms; Shape; Skeleton; Topology;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2012 21st International Conference on
Conference_Location :
Tsukuba
ISSN :
1051-4651
Print_ISBN :
978-1-4673-2216-4
Type :
conf
Filename :
6460975
Link To Document :
بازگشت