Title :
Scaling Up Whole-Book Recognition
Author :
Xiu, Pingping ; Baird, Henry S.
Author_Institution :
Comput. Sci. & Eng. Dept, Lehigh Univ., Bethlehem, PA, USA
Abstract :
We describe the results of large-scale experiments with algorithms for unsupervised improvement of recognition of book-images using fully automatic mutual-entropy-based model adaptation. Each experiment is initialized with an imperfect iconic model derived from errorful OCR results, and a more or less perfect linguistic model, after which our fully automatic adaptation algorithm corrects the iconic model to achieve improved accuracy, guided only by evidence within the test set. Mutual-entropy scores measure disagreements between the two models and identify candidates for iconic model correction. Previously published experiments have shown that word error rates fall monotonically with passage length. Here we show similar results for character error rates extending over far longer passages up to fifty pages in length: we observed error rates were driven from 25% down to 1.9%. We present new experimental results to support the motivating principle of our strategy: that error rates and mutual-entropy scores are strongly correlated. Also, we discuss theoretical, algorithmic, and methodological challenges that we have encountered as we scale up experiments towards complete books.
Keywords :
correlation methods; document image processing; entropy; linguistics; optical character recognition; OCR; automatic adaptation algorithm; book-image recognition; character error rate; document image recognition; iconic model correction; linguistic model; mutual-entropy-based model adaptation; Adaptation model; Algorithm design and analysis; Books; Computer science; Drives; Entropy; Error analysis; Error correction; Image recognition; Text analysis; adaptive classification; anytime algorithms; book recognition; document image recognition; isogeny; model adaptation; mutual entropy;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.22