• DocumentCode
    3524482
  • Title

    Correcting broken characters in the recognition of historical printed documents

  • Author

    Droettboom, Michael

  • Author_Institution
    Digital Knowledge Center, Johns Hopkins Univ., Baltimore, MD, USA
  • fYear
    2003
  • fDate
    27-31 May 2003
  • Firstpage
    364
  • Lastpage
    366
  • Abstract
    We present a new technique for dealing with broken characters, one of the major challenges in the optical character recognition (OCR) of degraded historical printed documents. A technique based on graph combinatorics is used to rejoin the appropriate connected components. It has been applied to real data with successful results.
  • Keywords
    character sets; document image processing; graph theory; history; optical character recognition; OCR; broken character correction; connected component; graph combinatorics; historical printed document; optical character recognition; Business; Carbon capture and storage; Character recognition; Combinatorial mathematics; Degradation; Optical character recognition software; Optical design; Printing; Robustness; Shape;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Digital Libraries, 2003. Proceedings. 2003 Joint Conference on
  • Print_ISBN
    0-7695-1939-3
  • Type

    conf

  • DOI
    10.1109/JCDL.2003.1204889
  • Filename
    1204889