DocumentCode :
2747315
Title :
Recognition strategies for general handwritten text documents
Author :
Shridhar, M. ; Houle, G.F. ; Kimura, F.
fYear :
2009
fDate :
7-9 June 2009
Firstpage :
167
Lastpage :
169
Abstract :
This paper presents document recognition strategies for an important application: Recognition of text document containing multiple lines of text data. A project to study the feasibility of recognizing essays written by middle school students is the focus of the second study. In this project, a scanned document is processed to extract individual lines of text from the essay, extract individual words from the line and then apply word recognition techniques to the extracted words. While individual lines of data are extracted accurately using gap information between lines, extraction of words is a much bigger challenge. Since the essays are written by middle school children, word boundaries are ambiguous, especially when words are written in a non-cursive discrete style. In these cases the gaps between words are sometimes smaller than the gaps between characters of the word causing errors in estimating the location of word boundaries. In this paper, we propose two classes of word boundaries: 1) strong boundaries due to large gaps between words, 2) weak boundaries due to small gaps between words. There are also cases when two words do not have a clear gap between them, but are rather joined to give the appearance of a single word. Results obtained from our phase 1 study will be presented in the paper.
Keywords :
document image processing; feature extraction; handwritten character recognition; optical character recognition; text analysis; OCR; essay recognition; handwritten text document recognition strategy; middle school student; noncursive discrete style writing; scanned document processing; word feature extraction; word recognition technique; Cities and towns; Data mining; Educational institutions; Focusing; Handwriting recognition; Image recognition; Law; Legal factors; Postal services; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Electro/Information Technology, 2009. eit '09. IEEE International Conference on
Conference_Location :
Windsor, ON
Print_ISBN :
978-1-4244-3354-4
Electronic_ISBN :
978-1-4244-3355-1
Type :
conf
DOI :
10.1109/EIT.2009.5189603
Filename :
5189603
Link To Document :
بازگشت