DocumentCode :
183435
Title :
Semiautomatic Text Baseline Detection in Large Historical Handwritten Documents
Author :
Bosch, Vicente ; Toselli, Alejandro Hector ; Vidal, Enrique
Author_Institution :
PRHLT Res. Center, Univ. Politec. Valencia, Valencia, Spain
fYear :
2014
fDate :
1-4 Sept. 2014
Firstpage :
690
Lastpage :
695
Abstract :
A semiautomatic iterative process for the detection of text baselines in historical handwritten document images is presented. It relies on the use of Hidden Markov Models (HMM) to provide initial text baselines hypotheses, followed by user review in order to produce ground-truth quality results. Using the set of revised baselines as ground truth, the HMM´s are re-trained before processing the next batch of pages. This process has been evaluated in the context of a real transcription task which, as a by-product, has produced line-detection ground truth. We show that the usage of a formal, HMM-based line-detection approach which requires training data, not only yields good detection results but is also of practical use in large handwritten image collections. Through experiments with real users we show that the proposed approach has interesting features, namely, accuracy, scalability and ease of use, as well as low overall human effort requirements.
Keywords :
document image processing; handwritten character recognition; hidden Markov models; text detection; HMM-based line-detection approach; hidden Markov model; historical handwritten document image; semiautomatic iterative process; semiautomatic text baseline detection; training data; Accuracy; Feature extraction; Hidden Markov models; Image segmentation; Layout; Training; Vectors; baseline detection; ground truth creation; process;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location :
Heraklion
ISSN :
2167-6445
Print_ISBN :
978-1-4799-4335-7
Type :
conf
DOI :
10.1109/ICFHR.2014.121
Filename :
6981100
Link To Document :
بازگشت