DocumentCode
2145890
Title
Localization of Digit Strings in Farsi/Arabic Document Images Using Structural Features and Syntactical Analysis
Author
Abedi, Ali ; Faez, Karim
Author_Institution
Electr. Eng. Dept., Amirkabir Univ. of Technol., Tehran, Iran
fYear
2011
fDate
18-21 Sept. 2011
Firstpage
728
Lastpage
733
Abstract
This paper presents a new method for localization of digit strings with a specific syntax in Farsi/ Arabic document images. First, some features are extracted from all connected components in each text line. These features, are provided for Farsi/ Arabic scripts, and have the ability to differentiate between digits and non-digit connected components. Then, these features are classified, and the probabilities of being in each of four classes digit, slash, double-digit, and non-digit, is assigned to each connected component. Next, discrete hidden Marcov model as syntactic analyzer, localize digit strings with desired syntaxes. The results which are presented for handwritten and machine-printed text lines, separately, are very promising.
Keywords
document image processing; handwriting recognition; hidden Markov models; natural language processing; Farsi-Arabic document images; digit strings localization; discrete hidden Markov model; handwritten text lines; machine printed text lines; structural features; syntactical analysis; Feature extraction; Hidden Markov models; Neodymium; Pattern recognition; Support vector machines; Syntactics; Training; Farsi/Arabic document image analysis; digit strings localization; feature extraction; handwritten dates; syntax verification;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location
Beijing
ISSN
1520-5363
Print_ISBN
978-1-4577-1350-7
Electronic_ISBN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2011.152
Filename
6065407
Link To Document