Title :
Markov Random Field Based Text Identification from Annotated Machine Printed Documents
Author :
Peng, Xujun ; Setlur, Srirangaraj ; Govindaraju, Venu ; Sitaram, Ramachandrula ; Bhuvanagiri, Kiran
Author_Institution :
Dept. of Comput. Sci. & Eng., SUNY at Buffalo, Amherst, NY, USA
Abstract :
In this paper, we describe an approach to segment handwritten text, machine printed text and noise from annotated machine printed documents. Three categories of word level features are extracted. We use a modified K-Means clustering algorithm for classification followed by a relabeling procedure using Markov Random Field(MRF) based on a concept of neighboring patches and Belief Propagation(BP) rules. Experimental results on an imbalanced data set show that our approach achieves an overall recall of 96.33%.
Keywords :
Markov processes; document image processing; feature extraction; image classification; image segmentation; pattern clustering; random processes; text analysis; Markov random field; annotated machine printed document; belief propagation; feature extraction; k-mean clustering algorithm; machine printed text; segment handwritten text; text identification; Classification algorithms; Feature extraction; Gabor filters; Handwriting recognition; Hidden Markov models; Image segmentation; Markov random fields; Optical character recognition software; Text analysis; Text recognition;
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
DOI :
10.1109/ICDAR.2009.237