DocumentCode :
3341393
Title :
Towards Whole-Book Recognition
Author :
Xiu, Pingping ; Baird, Henry S.
Author_Institution :
Comput. Sci. & Eng. Dept., Lehigh Univ., Bethlehem, PA
fYear :
2008
fDate :
16-19 Sept. 2008
Firstpage :
629
Lastpage :
636
Abstract :
We describe experimental results for unsupervised recognition of the textual contents of book-images using fully automatic mutual-entropy-based model adaptation. Each experiment starts with approximate iconic and linguistic models---derived from (generally errorful) OCR results and (generally incomplete) dictionaries---and then runs a fully automatic adaptation algorithm which, guided entirely by evidence internal to the test set, attempts to correct the models for improved accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence probabilities. Our adaptation algorithm detects disagreements between the models by analyzing mutual entropy between (1) the a posteriori probability distribution of character classes (the recognition results from image classification alone), and (2) the a posteriori probability distribution of word classes (the recognition results from image classification combined with linguistic constraints). Disagreements identify candidates for automatic model corrections. We report experiments on 40 textlines in which word error rates fall monotonicaly with passage lengths. We also report experiments on an enhanced algorithm which can cope with character-segmentation errors (a single split, or a single merge, per word). In order to scale up experiments, soon, to whole book images, we have revised data structures and implemented speed enhancements. For this algorithm, we report results on three increasingly long passage lengths: (a) one full page, (b) five pages, and (b) ten pages. We observe that error rates on long words fall monotonically with passage lengths.
Keywords :
document image processing; entropy; image classification; adaptive classification; book recognition; character-image classifier; character-segmentation errors; document image processing; iconic model; image classification; image formation; linguistic model; model adaptation; mutual entropy; word-occurrence probability; Adaptation model; Automatic testing; Character recognition; Error analysis; Error correction; Image analysis; Image classification; Image recognition; Optical character recognition software; Probability distribution; adaptive classification; anytime algorithms; book recognition; document image recognition; isogeny; model adaptation; mutual entropy;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis Systems, 2008. DAS '08. The Eighth IAPR International Workshop on
Conference_Location :
Nara
Print_ISBN :
978-0-7695-3337-7
Type :
conf
DOI :
10.1109/DAS.2008.50
Filename :
4670015
Link To Document :
بازگشت