DocumentCode :
3695050
Title :
Solving substitution ciphers for OCR with a semi-supervised hidden Markov model
Author :
Erik Scharwächter;Stephan Vogel
Author_Institution :
Qatar Computing Research Institute, Doha, Qatar
fYear :
2015
Firstpage :
11
Lastpage :
15
Abstract :
In the past unsupervised HMM training has been applied to solve letter substitution ciphers as they appear in various problems in Natural Language Processing. For some problems, parts of the cipher key can easily be provided by the user, but full manual deciphering would be too time consuming. In this work a semi-supervised HMM deciphering approach that uses partial ground-truth data is introduced and evaluated empirically on synthetic and real-life data for Arabic Optical Character Recognition (OCR). Adding only a small amount of supervision improves deciphering performance drastically under optimal conditions, especially for short ciphertexts. In complex real-life scenarios results are better than in the unsupervised baseline approach.
Keywords :
"Hidden Markov models","Training","Ciphers","Data models","Optical character recognition software","Shape","Error analysis"
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type :
conf
DOI :
10.1109/ICDAR.2015.7333716
Filename :
7333716
Link To Document :
بازگشت