Title :
Semiautomatic Text Baseline Detection in Large Historical Handwritten Documents
Author :
Bosch, Vicente ; Toselli, Alejandro Hector ; Vidal, Enrique
Author_Institution :
PRHLT Res. Center, Univ. Politec. Valencia, Valencia, Spain
Abstract :
A semiautomatic iterative process for the detection of text baselines in historical handwritten document images is presented. It relies on the use of Hidden Markov Models (HMM) to provide initial text baselines hypotheses, followed by user review in order to produce ground-truth quality results. Using the set of revised baselines as ground truth, the HMM´s are re-trained before processing the next batch of pages. This process has been evaluated in the context of a real transcription task which, as a by-product, has produced line-detection ground truth. We show that the usage of a formal, HMM-based line-detection approach which requires training data, not only yields good detection results but is also of practical use in large handwritten image collections. Through experiments with real users we show that the proposed approach has interesting features, namely, accuracy, scalability and ease of use, as well as low overall human effort requirements.
Keywords :
document image processing; handwritten character recognition; hidden Markov models; text detection; HMM-based line-detection approach; hidden Markov model; historical handwritten document image; semiautomatic iterative process; semiautomatic text baseline detection; training data; Accuracy; Feature extraction; Hidden Markov models; Image segmentation; Layout; Training; Vectors; baseline detection; ground truth creation; process;
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location :
Heraklion
Print_ISBN :
978-1-4799-4335-7
DOI :
10.1109/ICFHR.2014.121