• DocumentCode
    153403
  • Title

    Automatic Training Set Generation for Better Historic Document Transcription and Compression

  • Author

    de Franca Pereira e Silva, Gabriel ; Dueire Lins, Rafael ; Gomes, Chandima

  • Author_Institution
    Univ. Fed. de Pernambuco, Recife, Brazil
  • fYear
    2014
  • fDate
    7-10 April 2014
  • Firstpage
    277
  • Lastpage
    281
  • Abstract
    The more complete the training set of an optical character recognition platform, the greater the chances of obtaining a better precision in transcription. The development of a database for such purpose is a task of paramount effort as it is performed manually and must be as extensive as possible in order to potentially cover all words in a language. Dealing with historic documents either handwritten, typed, or printed is even a harder effort as documents are often degraded by time and storage conditions. The recent work of Silva-Lins showed how to automatically generate training sets of isolated characters for cursive writing of one specific person. This is particularly important in the transcription of historic files of important people. The present work improves that strategy by analyzing letter ligature patterns. The improvement in OCR transcription accuracy both of printed, typed and handwritten documents is borne out by experimental evidence.
  • Keywords
    document image processing; learning (artificial intelligence); optical character recognition; OCR transcription accuracy; automatic training set generation; database; handwritten historic documents; historic document compression; historic document transcription; optical character recognition platform; printed historic documents; typed historic documents; Accuracy; Dictionaries; Noise; Optical character recognition software; Pattern recognition; Training; OCR; documents; font sets; training sets;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems (DAS), 2014 11th IAPR International Workshop on
  • Conference_Location
    Tours
  • Print_ISBN
    978-1-4799-3243-6
  • Type

    conf

  • DOI
    10.1109/DAS.2014.30
  • Filename
    6831013