Title :
Classification models for historical manuscript recognition
Author :
Feng, S.L. ; Manmatha, R.
Author_Institution :
Multimedia Indexing & Retrieval Group, Massachusetts Univ., Amherst, MA, USA
fDate :
29 Aug.-1 Sept. 2005
Abstract :
This paper investigates different machine learning models to solve the historical handwritten manuscript recognition problem. In particular, we test and compare support vector machines, conditional maximum entropy models and Naive Bayes with kernel density estimates and explore their behaviors and properties when solving this problem. We focus on a whole word problem to avoid having to do character segmentation which is difficult with degraded handwritten documents. Our results on a publicly available standard dataset of 20 pages of George Washington´s manuscripts show that Naive Bayes with Gaussian kernel density estimates significantly outperforms the other models and prior work using hidden Markov models on this heavily unbalanced dataset.
Keywords :
Bayes methods; Gaussian processes; handwritten character recognition; hidden Markov models; history; image segmentation; learning (artificial intelligence); pattern classification; Gaussian kernel density estimates; George Washington manuscripts; Naive Bayes model; character segmentation; classification models; conditional maximum entropy models; hidden Markov models; historical handwritten manuscript recognition; machine learning models; support vector machines; Character recognition; Entropy; Handwriting recognition; Hidden Markov models; Information retrieval; Kernel; Machine learning; Optical character recognition software; Support vector machine classification; Support vector machines;
Conference_Titel :
Document Analysis and Recognition, 2005. Proceedings. Eighth International Conference on
Print_ISBN :
0-7695-2420-6
DOI :
10.1109/ICDAR.2005.73