Title :
Separating handwritten material from machine printed text using hidden Markov models
Author :
Guo, Jinhong K. ; Ma, Matthew Y.
Author_Institution :
Panasonic Inf. & Networking Technols. Lab., Princeton, NJ, USA
fDate :
6/23/1905 12:00:00 AM
Abstract :
In this paper, we address the problem of separating handwritten annotations from machine-printed text within a document. We present an algorithm that is based on the theory of hidden Markov models (HMMs) to distinguish between machine-printed and handwritten materials. No OCR results are required prior to or during the process, and the classification is performed at the word level. Handwritten annotations are not limited to marginal areas, as the approach can deal with document images having handwritten annotations overlaid on machine-printed text and it has been shown to be promising in our experiments. Experimental results show that the proposed method can achieve 72.19% recall for fully extracted handwritten words and 90.37% for partially extracted words. The precision of extracting handwritten words has reached 92.86%
Keywords :
document image processing; handwriting recognition; hidden Markov models; document images; document text separation; handwritten annotations; handwritten words extraction; hidden Markov models; machine-printed text; precision; recall; word-level classification; Data mining; Engines; Handwriting recognition; Hidden Markov models; Image coding; Instruments; Laboratories; Neural networks; Optical character recognition software; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2001. Proceedings. Sixth International Conference on
Conference_Location :
Seattle, WA
Print_ISBN :
0-7695-1263-1
DOI :
10.1109/ICDAR.2001.953828