DocumentCode
3489752
Title
Human Evaluation of the Transcription Process of a Marriage License Book
Author
Romero, Veronica ; Andreu Sanchez, Joan
Author_Institution
Dept. de Sist. Informaticos y Comput., Univ. Politec. de Valencia, Valencia, Spain
fYear
2013
fDate
25-28 Aug. 2013
Firstpage
1255
Lastpage
1259
Abstract
Handwriting Text Recognition (HTR) of historical documents is a very important research field of Document Image Analysis. Currently, the most well-accepted technology for off-line HTR is based on holistic, segmentation-free techniques that do not need any kind of character or word segmentation. This HTR technology is based in stochastic models that are trained with annotated data. The performance of this technology is still far from being perfect and therefore the user intervention is necessary to obtain perfect transcripts. The user intervention can be carried out in a post-editing process, in which the user corrects the errors produced by an automatic HTR system. Interactive techniques have been proposed in the past few years to obtain the correct transcript as an alternative to post-editing the transcripts. In these interactive approaches, the user and the system work interactively in tight mutual collaboration to obtain the perfect transcript of the data. In this interactive scenario, the feedback provided by the user is used to improve interactively the system output. In the post-editing scenario and in the interactive scenario, the transcribed material can be used for retraining the models as the data is processed. In this research we carried out a study with a real transcriber about how the performance of an HTR system improved with respect to the amount of training data, and how the human efficiency improved during the transcription process in both transcription scenarios.
Keywords
document image processing; handwriting recognition; history; user interfaces; automatic HTR system; character segmentation; document image analysis; handwriting text recognition; historical documents; marriage license book; post-editing process; segmentation-free techniques; stochastic models; transcription process; user feedback; user intervention; word segmentation; Data models; Hidden Markov models; Licenses; Manuals; Text recognition; Training; Training data;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location
Washington, DC
ISSN
1520-5363
Type
conf
DOI
10.1109/ICDAR.2013.254
Filename
6628815
Link To Document