Title :
Handwritten/Printed Text Separation Using Pseudo-Lines for Contextual Re-labeling
Author :
Awal, Ahmad Montaser ; Belaid, Abdel ; D´Andecy, Vincent Poulain
Author_Institution :
Univ. de Lorraine LORIA, Vandoeuvre-les-Nancy, France
Abstract :
This paper addresses the problem of machine printed and handwritten text separation in real noisy documents. We have proposed in a previous work a robust separation system relying on a proximity string segmentation algorithm. The extracted pseudo-lines and pseudo-words are used as basic blocks for classification. A multi-class support vector machine (SVM) with Gaussian kernel associates first an appropriate label to each pseudo-word. Then, the local neighborhood of each pseudo-word is studied in order to propagate the context and correct the classification errors. In this work, we first propose to model the separation problem by conditional random fields considering the horizontal neighborhood. As the considered neighborhood is too local to solve certain error cases, we have enhanced this method by using a more global context based on class dominance in the pseudo-line. The method has been evaluated on business documents. It separates handwritten and printed text with better scores (99.1% and 99.2% respectively), contrary to noise which is very random in these documents (90.1%).
Keywords :
Gaussian processes; document image processing; handwriting recognition; image classification; image segmentation; random processes; support vector machines; text detection; Gaussian kernel; SVM; business document; classification error; conditional random field; contextual relabeling; handwritten text separation; handwritten/printed text separation; machine printed text separation; multiclass support vector machine; proximity string segmentation algorithm; pseudo-lines; pseudo-words; real noisy document; robust separation system; Business; Classification algorithms; Context modeling; Feature extraction; Labeling; Noise; Support vector machines; contextual analysis; document segmentation; patch classification; printed/handwritten/noise separation; pseudo-line and pseudo-word extraction;
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location :
Heraklion
Print_ISBN :
978-1-4799-4335-7
DOI :
10.1109/ICFHR.2014.13