Title of article :
Semi-supervised learning for character recognition in historical archive documents
Author/Authors :
Barbara Richarz، نويسنده , , Jan and Vajda، نويسنده , , Szilard and Grzeszick، نويسنده , , Rene and Fink، نويسنده , , Gernot A.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Pages :
10
From page :
1011
To page :
1020
Abstract :
Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment.
Keywords :
semi-supervised learning , Character recognition , Historical documents
Journal title :
PATTERN RECOGNITION
Serial Year :
2014
Journal title :
PATTERN RECOGNITION
Record number :
1735994
Link To Document :
بازگشت