DocumentCode :
2142239
Title :
HMM-Based Alignment of Inaccurate Transcriptions for Historical Documents
Author :
Fischer, Andreas ; Indermühle, Emanuel ; Frinken, Volkmar ; Bunke, Horst
Author_Institution :
Inst. of Comput. Sci. & Appl. Math., Univ. of Bern, Bern, Switzerland
fYear :
2011
fDate :
18-21 Sept. 2011
Firstpage :
53
Lastpage :
57
Abstract :
For historical documents, available transcriptions typically are inaccurate when compared with the scanned document images. Not only the position of the words and sentences are unknown, but also the correct image transcription may not be matched exactly. An error-tolerant alignment is needed to make the document images amenable to browsing and searching in digital libraries. In this paper, we propose a novel multi-pass alignment method based on Hidden Markov Models (HMM) that combines text line recognition, string alignment, and keyword spotting to cope with word substitutions, deletions, and insertions in the transcription. In a segmentation-free approach, transcriptions of complete pages are aligned with sequences of text line images. On the Parzival data set, results are reported for several degrees of artificial distortions. Both the accuracy and the efficiency of the proposed system are promising for real-world applications.
Keywords :
document image processing; hidden Markov models; text analysis; HMM-based alignment; Parzival data set; browsing; digital library; error-tolerant alignment; hidden Markov model; historical documents; image transcription; inaccurate transcription; keyword spotting; multipass alignment method; scanned document image; searching; string alignment; text line recognition; Accuracy; Feature extraction; Handwriting recognition; Hidden Markov models; Image segmentation; Text analysis; Text recognition; handwriting recognition; hidden Markov models;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2011 International Conference on
Conference_Location :
Beijing
ISSN :
1520-5363
Print_ISBN :
978-1-4577-1350-7
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2011.20
Filename :
6065275
Link To Document :
بازگشت