• DocumentCode
    478633
  • Title

    Lexicon Reduction in Handwriting Recognition Using Topic Categorization

  • Author

    Farooq, Faisal ; Chandalia, Gaurav ; Govindaraju, Venu

  • fYear
    2008
  • fDate
    16-19 Sept. 2008
  • Firstpage
    369
  • Lastpage
    375
  • Abstract
    Despite several decades of research in handwriting recognition, the goal of having computers access handwritten information from unconstrained document images is still elusive. Current handwriting recognition systems are only capable of recognizing words that are present in a restricted lexicon typically comprised of 10 to 1000 words. As the size of the lexicon grows, the recognition accuracy falls sharply and is reported to be around 30% for a10K word lexicon.  The objective of this research is to raise the accuracy levels on unconstrained handwritten documents by reducing the size of lexicons. We present an innovative method of lexicon reduction by topic categorization of handwritten documents. After categorization of a document into a topic e.g. sports, science etc. we use smaller lexicons that include only words with high mutual information with that topic and hence increase performance of recognizers. In this paper we present different techniques and report results on a publicly available dataset.
  • Keywords
    Character generation; Color; Data mining; Databases; Government; Handwriting recognition; Medical services; Optical character recognition software; Storage automation; Text analysis; Handwriting Recognition; Lexicon Reduction; Maximum Entropy; Naive Bayes; Topic Classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
  • Conference_Location
    Nara, Japan
  • Print_ISBN
    978-0-7695-3337-7
  • Type

    conf

  • DOI
    10.1109/DAS.2008.10
  • Filename
    4669983