Title : 
Towards Semi-supervised Transcription of Handwritten Historical Weather Reports
         
        
            Author : 
Richarz, Jan ; Vajda, Szilárd ; Fink, Gernot A.
         
        
            Author_Institution : 
Dept. of Comput. Sci., Tech. Univ. Dortmund, Dortmund, Germany
         
        
        
        
        
        
            Abstract : 
This paper addresses the automatic transcription of handwritten documents with a regular tabular structure. A method for extracting machine printed tables from images is proposed, using very little prior knowledge about the document layout. The detected table serves as query for retrieving and fitting a structural template, which is then used to extract handwritten text fields. A semi-supervised learning approach is applied to this fields, aiming at minimizing the human labeling effort for recognizer training. The effectiveness of the proposed approach is demonstrated experimentally on a set of historical weather reports. Compared to using all labels, competitive recognition performance is achieved by labeling only a small fraction of the data, keeping the required human effort very low.
         
        
            Keywords : 
feature extraction; geophysics computing; handwritten character recognition; history; image retrieval; learning (artificial intelligence); text analysis; text detection; automatic handwritten document transcription; document layout; handwritten historical weather reports; handwritten text field extraction; human labeling effort minimisation; machine printed table extraction method; query processing; regular tabular structure; semisupervised learning; semisupervised transcription; structural template fitting; structural template retrieval; training recognizer; Handwriting recognition; Humans; Labeling; Meteorology; Principal component analysis; Text analysis; Training; document analysis; handwriting recognition; historical documents; layout analysis; semi-supervised learning;
         
        
        
        
            Conference_Titel : 
Document Analysis Systems (DAS), 2012 10th IAPR International Workshop on
         
        
            Conference_Location : 
Gold Cost, QLD
         
        
            Print_ISBN : 
978-1-4673-0868-7
         
        
        
            DOI : 
10.1109/DAS.2012.91