DocumentCode
478633
Title
Lexicon Reduction in Handwriting Recognition Using Topic Categorization
Author
Farooq, Faisal ; Chandalia, Gaurav ; Govindaraju, Venu
fYear
2008
fDate
16-19 Sept. 2008
Firstpage
369
Lastpage
375
Abstract
Despite several decades of research in handwriting recognition, the goal of having computers access handwritten information from unconstrained document images is still elusive. Current handwriting recognition systems are only capable of recognizing words that are present in a restricted lexicon typically comprised of 10 to 1000 words. As the size of the lexicon grows, the recognition accuracy falls sharply and is reported to be around 30% for a10K word lexicon. The objective of this research is to raise the accuracy levels on unconstrained handwritten documents by reducing the size of lexicons. We present an innovative method of lexicon reduction by topic categorization of handwritten documents. After categorization of a document into a topic e.g. sports, science etc. we use smaller lexicons that include only words with high mutual information with that topic and hence increase performance of recognizers. In this paper we present different techniques and report results on a publicly available dataset.
Keywords
Character generation; Color; Data mining; Databases; Government; Handwriting recognition; Medical services; Optical character recognition software; Storage automation; Text analysis; Handwriting Recognition; Lexicon Reduction; Maximum Entropy; Naive Bayes; Topic Classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location
Nara, Japan
Print_ISBN
978-0-7695-3337-7
Type
conf
DOI
10.1109/DAS.2008.10
Filename
4669983
Link To Document