Title :
Human Evaluation of the Transcription Process of a Marriage License Book
Author :
Romero, Veronica ; Andreu Sanchez, Joan
Author_Institution :
Dept. de Sist. Informaticos y Comput., Univ. Politec. de Valencia, Valencia, Spain
Abstract :
Handwriting Text Recognition (HTR) of historical documents is a very important research field of Document Image Analysis. Currently, the most well-accepted technology for off-line HTR is based on holistic, segmentation-free techniques that do not need any kind of character or word segmentation. This HTR technology is based in stochastic models that are trained with annotated data. The performance of this technology is still far from being perfect and therefore the user intervention is necessary to obtain perfect transcripts. The user intervention can be carried out in a post-editing process, in which the user corrects the errors produced by an automatic HTR system. Interactive techniques have been proposed in the past few years to obtain the correct transcript as an alternative to post-editing the transcripts. In these interactive approaches, the user and the system work interactively in tight mutual collaboration to obtain the perfect transcript of the data. In this interactive scenario, the feedback provided by the user is used to improve interactively the system output. In the post-editing scenario and in the interactive scenario, the transcribed material can be used for retraining the models as the data is processed. In this research we carried out a study with a real transcriber about how the performance of an HTR system improved with respect to the amount of training data, and how the human efficiency improved during the transcription process in both transcription scenarios.
Keywords :
document image processing; handwriting recognition; history; user interfaces; automatic HTR system; character segmentation; document image analysis; handwriting text recognition; historical documents; marriage license book; post-editing process; segmentation-free techniques; stochastic models; transcription process; user feedback; user intervention; word segmentation; Data models; Hidden Markov models; Licenses; Manuals; Text recognition; Training; Training data;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.254