Title :
OCR Post-processing Using Weighted Finite-State Transducers
Author :
Llobet, Rafael ; Cerdan-Navarro, J.-R. ; Perez-Cortes, Juan-Carlos ; Arlandis, Joaquim
Author_Institution :
Inst. Tecnol. de Inf., Univ. Politec. de Valencia, Valencia, Spain
Abstract :
A new approach for Stochastic Error-Correcting Language Modeling based on Weighted Finite-State Transducers (WFSTs) is proposed as a method to post-process the results of an optical character recognizer (OCR). Instead of using the recognized string as an input to the transducer, in our approach the complete set of OCR hypotheses, a sequence of vectors of a posteriori class probabilities, is used to build a WFST that is then composed with independent WFSTs for the error and language models. This combines the practical advantages of a de-coupled (OCR + post-processor) model with the full power of an integrated model.
Keywords :
finite state machines; optical character recognition; probability; OCR hypotheses; OCR post-processing; optical character recognition; posteriori class probability; stochastic error-correcting language modeling; weighted finite-state transducers; Biological system modeling; Computational modeling; Optical character recognition software; Probabilistic logic; Stochastic processes; Transducers; Viterbi algorithm; Language Modeling; OCR post-processing; Weighted Finite-State Automatas;
Conference_Titel :
Pattern Recognition (ICPR), 2010 20th International Conference on
Conference_Location :
Istanbul
Print_ISBN :
978-1-4244-7542-1
DOI :
10.1109/ICPR.2010.498