Title :
Text Line Extraction Using DMLP Classifiers for Historical Manuscripts
Author :
Baechler, Micheal ; Liwicki, Marcus ; Ingold, Rolf
Author_Institution :
Dept. of Inf., Univ. of Fribourg, Fribourg, Switzerland
Abstract :
This paper proposes a novel text line extraction method for historical documents. The method works in two steps. In the first step, layout analysis is performed to recognize the physical structure of a given document using a classification technique, more precisely the pixels of a coloured document image are classified into five classes: text-block, core-text-line, decoration, background, and periphery. This layout recognition is achieved by a cascade of two Dynamic Multilayer Perceptron (DMLP) classifiers and works without binarisation. In the second step, an algorithm takes the layout recognition results as an input, extracts the text lines, and groups them into blocks using the connected components approach. Finally, the algorithm refines the boundaries of the text lines using the binary image and the layout recognition results. Our system is evaluated on three historical manuscripts with a test set of 49 pages. The best obtained hit rate for text lines is 96.3%.
Keywords :
document image processing; feature extraction; history; image classification; image colour analysis; multilayer perceptrons; DMLP classifiers; background class; binary image; coloured document image classification; connected components approach; core-text-line class; decoration class; dynamic multilayer perceptron classifiers; historical documents; historical manuscripts; layout analysis; layout recognition; periphery class; text line extraction method; text-block class; Feature extraction; Image resolution; Image segmentation; Layout; Neurons; Text analysis; Training;
Conference_Titel :
Document Analysis and Recognition (ICDAR), 2013 12th International Conference on
Conference_Location :
Washington, DC
DOI :
10.1109/ICDAR.2013.206