DocumentCode :
183258
Title :
A Comparison of Recognition Strategies for Printed/Handwritten Composite Documents
Author :
Moysset, Bastien ; Messina, Ronaldo ; Kermorvant, Christopher
Author_Institution :
A2iA, Paris, France
fYear :
2014
fDate :
1-4 Sept. 2014
Firstpage :
158
Lastpage :
163
Abstract :
Full-page segmentation and recognition of real-world documents is a challenging task, involving the segmentation of the images (graphics, text) and the subsequent recognition of the detected text-zones. Often those documents present zones with both write-types: printed and handwritten, which so far have been dealt with by classifying the zones according to the write-type and then using type-specific models for recognition. Here we present two recognition systems using state-of-the-art recurrent neural networks, that can recognize the text in zones with both write-types, without the need of explicit type identification, just the segmentation in lines is needed. In one of the systems, there is no distinction on the type at the network´s output (one output label per character) while in the other there is one output label for each character and write-type. Experiments have been done on real-world documents from the Maurdor competition. These two systems perform at a similar level than systems using specific networks per type on the constrained task where there is only one write-type per zone. They perform better when both handwritten and printed text are present in the text zone. The results open the perspective to treat OCR and handwritten text recognition with a single optical model.
Keywords :
handwritten character recognition; image classification; image segmentation; optical character recognition; recurrent neural nets; text detection; Maurdor competition; OCR; constrained task; detected text-zone recognition; full-page real-world document recognition; full-page real-world document segmentation; handwritten composite document recognition strategy; handwritten text recognition; image segmentation; line segmentation; network output; optical model; output label; printed composite document recognition strategy; recurrent neural networks; type-specific models; write-type zones; Adaptive optics; Detectors; Error analysis; Handwriting recognition; Hidden Markov models; Image segmentation; Text recognition; Handwritten; Mixed; Printed; Recurrent neural network; Text recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location :
Heraklion
ISSN :
2167-6445
Print_ISBN :
978-1-4799-4335-7
Type :
conf
DOI :
10.1109/ICFHR.2014.34
Filename :
6981013
Link To Document :
بازگشت