• DocumentCode
    1635824
  • Title

    A New Framework for Recognition of Heavily Degraded Characters in Historical Typewritten Documents Based on Semi-Supervised Clustering

  • Author

    Pletschacher, S. ; Hu, J. ; Antonacopoulos, A.

  • Author_Institution
    Pattern Recognition & Image Anal. (PRImA) Res. Lab., Univ. of Salford, Manchester, UK
  • fYear
    2009
  • Firstpage
    506
  • Lastpage
    510
  • Abstract
    This paper presents a new semi-supervised clustering framework to the recognition of heavily degraded characters in historical typewritten documents, where off-the-shelf OCR typically fails. The constraints are generated using typographical (collection-independent) domain knowledge and are used to guide both sample (glyph set) partitioning and metric learning. Experimental results using simple features provide encouraging evidence that this approach can lead to significantly improved clustering results compared to simple K-means clustering, as well as to clustering using a state-of-the art OCR engine.
  • Keywords
    document image processing; history; learning (artificial intelligence); optical character recognition; pattern clustering; K-means clustering; heavily degraded character recognition framework; historical typewritten document; metric learning formulation; off-the-shelf OCR; sample partitioning; semisupervised clustering framework; typographical domain knowledge; Character recognition; Degradation; Engines; Image analysis; Image recognition; Image texture analysis; Optical character recognition software; Pattern analysis; Pattern recognition; Text analysis; Degraded character recognition; analysis of historical documents; semi-supervised clustering; typewritten documents;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.267
  • Filename
    5277612