DocumentCode :
478633
Title :
Lexicon Reduction in Handwriting Recognition Using Topic Categorization
Author :
Farooq, Faisal ; Chandalia, Gaurav ; Govindaraju, Venu
fYear :
2008
fDate :
16-19 Sept. 2008
Firstpage :
369
Lastpage :
375
Abstract :
Despite several decades of research in handwriting recognition, the goal of having computers access handwritten information from unconstrained document images is still elusive. Current handwriting recognition systems are only capable of recognizing words that are present in a restricted lexicon typically comprised of 10 to 1000 words. As the size of the lexicon grows, the recognition accuracy falls sharply and is reported to be around 30% for a10K word lexicon.  The objective of this research is to raise the accuracy levels on unconstrained handwritten documents by reducing the size of lexicons. We present an innovative method of lexicon reduction by topic categorization of handwritten documents. After categorization of a document into a topic e.g. sports, science etc. we use smaller lexicons that include only words with high mutual information with that topic and hence increase performance of recognizers. In this paper we present different techniques and report results on a publicly available dataset.
Keywords :
Character generation; Color; Data mining; Databases; Government; Handwriting recognition; Medical services; Optical character recognition software; Storage automation; Text analysis; Handwriting Recognition; Lexicon Reduction; Maximum Entropy; Naive Bayes; Topic Classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara, Japan
Print_ISBN :
978-0-7695-3337-7
Type :
conf
DOI :
10.1109/DAS.2008.10
Filename :
4669983
Link To Document :
بازگشت