• DocumentCode
    3489752
  • Title

    Human Evaluation of the Transcription Process of a Marriage License Book

  • Author

    Romero, Veronica ; Andreu Sanchez, Joan

  • Author_Institution
    Dept. de Sist. Informaticos y Comput., Univ. Politec. de Valencia, Valencia, Spain
  • fYear
    2013
  • fDate
    25-28 Aug. 2013
  • Firstpage
    1255
  • Lastpage
    1259
  • Abstract
    Handwriting Text Recognition (HTR) of historical documents is a very important research field of Document Image Analysis. Currently, the most well-accepted technology for off-line HTR is based on holistic, segmentation-free techniques that do not need any kind of character or word segmentation. This HTR technology is based in stochastic models that are trained with annotated data. The performance of this technology is still far from being perfect and therefore the user intervention is necessary to obtain perfect transcripts. The user intervention can be carried out in a post-editing process, in which the user corrects the errors produced by an automatic HTR system. Interactive techniques have been proposed in the past few years to obtain the correct transcript as an alternative to post-editing the transcripts. In these interactive approaches, the user and the system work interactively in tight mutual collaboration to obtain the perfect transcript of the data. In this interactive scenario, the feedback provided by the user is used to improve interactively the system output. In the post-editing scenario and in the interactive scenario, the transcribed material can be used for retraining the models as the data is processed. In this research we carried out a study with a real transcriber about how the performance of an HTR system improved with respect to the amount of training data, and how the human efficiency improved during the transcription process in both transcription scenarios.
  • Keywords
    document image processing; handwriting recognition; history; user interfaces; automatic HTR system; character segmentation; document image analysis; handwriting text recognition; historical documents; marriage license book; post-editing process; segmentation-free techniques; stochastic models; transcription process; user feedback; user intervention; word segmentation; Data models; Hidden Markov models; Licenses; Manuals; Text recognition; Training; Training data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
  • Conference_Location
    Washington, DC
  • ISSN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2013.254
  • Filename
    6628815