DocumentCode :
2504567
Title :
Incorporating Linguistic Model Adaptation into Whole-Book Recognition
Author :
Xiu, Pingping ; Baird, Henry S.
Author_Institution :
Dept. of Comput. Sci. & Eng., Lehigh Univ., Bethlehem, PA, USA
fYear :
2010
fDate :
23-26 Aug. 2010
Firstpage :
2057
Lastpage :
2060
Abstract :
Whole-book recognition is a document image analysis strategy that operates on the complete set of a book´s page images using automatic adaptation to improve accuracy. Our algorithm expects to be given approximate iconic and linguistic models-derived from (generally errorful) OCR results and (generally incomplete) dictionaries-and then, guided entirely by evidence internal to the test set, corrects the models yielding improved accuracy. The iconic model describes image formation and determines the behavior of a character-image classifier. The linguistic model describes word-occurrence probabilities. In previous work, we reported that adapting the iconic model alone (with a perfect linguistic model) was able to automatically reduce word error rate on a 180-page book by a large factor. In this paper, we propose an algorithm that adapts both the iconic model and the linguistic model alternately to improve both models on the fly. The linguistic model adaptation method, which we report here, identifies new words and adds them to the dictionary. With 64.6% words missing in the initial dictionary, our previous algorithm reduced word error rate from 40.2% to 23.2%. The new algorithm drives word error rate down further from 23.2% to 16.0%.
Keywords :
document image processing; image recognition; linguistics; probability; automatic adaptation; character image classifier; document image analysis; image formation; linguistic model adaptation; occurrence probabilities; whole book recognition; word error rate; Adaptation model; Books; Character recognition; Dictionaries; Error analysis; Image recognition; Pragmatics; adaptive classification; book recognition; disagreement; model adaptation; unsupervised learning;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
ISSN :
1051-4651
Print_ISBN :
978-1-4244-7542-1
Type :
conf
DOI :
10.1109/ICPR.2010.1137
Filename :
5597279
Link To Document :
بازگشت