Title of article :
Semi-supervised learning for character recognition in historical archive documents
Author/Authors :
Barbara Richarz، نويسنده , , Jan and Vajda، نويسنده , , Szilard and Grzeszick، نويسنده , , Rene and Fink، نويسنده , , Gernot A.، نويسنده ,
Issue Information :
روزنامه با شماره پیاپی سال 2014
Abstract :
Training recognizers for handwritten characters is still a very time consuming task involving tremendous amounts of manual annotations by experts. In this paper we present semi-supervised labeling strategies that are able to considerably reduce the human effort. We propose two different methods to label and later recognize characters in collections of historical archive documents. The first one is based on clustering of different feature representations and the second one incorporates a simultaneous retrieval on different representations. Hence, both approaches are based on multi-view learning and later apply a voting procedure for reliably propagating annotations to unlabeled data. We evaluate our methods on the MNIST database of handwritten digits and introduce a realistic application in form of a database of handwritten historical weather reports. The experiments show that our method is able to significantly reduce the human effort that is required to build a character recognizer for the data collection considered while still achieving recognition rates that are close to a supervised classification experiment.
Keywords :
semi-supervised learning , Character recognition , Historical documents
Journal title :
PATTERN RECOGNITION
Journal title :
PATTERN RECOGNITION