DocumentCode :
2021618
Title :
Cryptogram Decoding for OCR Using Numerization Strings
Author :
Huang, Gary ; Learned-Miller, Erik ; McCallum, Andrew
Author_Institution :
Univ. of Massachusetts, Amherst
Volume :
1
fYear :
2007
fDate :
23-26 Sept. 2007
Firstpage :
208
Lastpage :
212
Abstract :
OCR systems for printed documents typically require large numbers of font styles and character models to work well. When given an unseen font, performance degrades even in the absence of noise. In this paper, we perform OCR in an unsupervised fashion without using any character models by using a cryptogram decoding algorithm. We present results on real and artificial OCR data.
Keywords :
cryptography; document image processing; optical character recognition; pattern clustering; OCR; character models; cryptogram decoding; font styles; numerization strings; printed documents; Computer science; Cryptography; Hamming distance; Hidden Markov models; Image coding; Ink; Intrusion detection; Iterative decoding; Optical character recognition software; Robustness;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
Conference_Location :
Parana
ISSN :
1520-5363
Print_ISBN :
978-0-7695-2822-9
Type :
conf
DOI :
10.1109/ICDAR.2007.4378705
Filename :
4378705
Link To Document :
بازگشت