DocumentCode :
2143789
Title :
Baseline Dependent Percentile Features for Offline Arabic Handwriting Recognition
Author :
Natarajan, Pradeep ; Belanger, David ; Prasad, Rohit ; Kamali, Matin ; Subramanian, Krishna ; Natarajan, Prem
Author_Institution :
Raytheon BBN Technol., Cambridge, MA, USA
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
329
Lastpage :
333
Abstract :
Handwritten text in Arabic and other languages exhibit significant variations in the slant and baseline of characters across words and also within a single word. Since the concept of baseline does not have a precise mathematical definition, existing approaches use heuristic methods to first identify a set of baseline relevant pixels and then fit lines/curves through them. However, for statistical features like percentiles that we use in our system, we only need an approximate curve that is close to the baseline to normalize the features. Hence we propose a two stage approach to estimate the approximate baseline. First we segment the text line into a set of components, and then estimate the baseline in each component using two methods max projection and smoothed centroid line. We incorpate the computed baseline into percentile feature computation in the BBN Byblos OCR system for an Arabic offline handwriting recognition task. Our new features, result in a 1% absolute gain and 3.1% relative gain in the word error rate on a large test set with 15K handwritten Arabic words, which is statistically significant with p-value<;0.001 using the matched pair comparison test. Further, our results show that computing fine-grained baselines from small line segments is significantly better than estimating a single baseline over the entire text line.
Keywords :
handwritten character recognition; optical character recognition; statistical analysis; BBN Byblos OCR system; baseline dependent percentile features; handwritten text; max projection; offline Arabic handwriting recognition; percentile feature computation; smoothed centroid line; statistical features; Character recognition; Estimation; Feature extraction; Handwriting recognition; Hidden Markov models; Merging; Training; Baselin-dependent percentile; Feature Extraction; handwriting recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.74
Filename :
6065329
Link To Document :
بازگشت