DocumentCode
3695050
Title
Solving substitution ciphers for OCR with a semi-supervised hidden Markov model
Author
Erik Scharwächter;Stephan Vogel
Author_Institution
Qatar Computing Research Institute, Doha, Qatar
fYear
2015
Firstpage
11
Lastpage
15
Abstract
In the past unsupervised HMM training has been applied to solve letter substitution ciphers as they appear in various problems in Natural Language Processing. For some problems, parts of the cipher key can easily be provided by the user, but full manual deciphering would be too time consuming. In this work a semi-supervised HMM deciphering approach that uses partial ground-truth data is introduced and evaluated empirically on synthetic and real-life data for Arabic Optical Character Recognition (OCR). Adding only a small amount of supervision improves deciphering performance drastically under optimal conditions, especially for short ciphertexts. In complex real-life scenarios results are better than in the unsupervised baseline approach.
Keywords
"Hidden Markov models","Training","Ciphers","Data models","Optical character recognition software","Shape","Error analysis"
Publisher
ieee
Conference_Titel
Document Analysis and Recognition (ICDAR), 2015 13th International Conference on
Type
conf
DOI
10.1109/ICDAR.2015.7333716
Filename
7333716
Link To Document