Title :
Automatic Discrimination between Printed and Handwritten Text in Documents
Author :
da Silva, Leonardo F. ; Conci, Aura ; Sanchez, Angel
Author_Institution :
Inst. de Comput., Univ. Fed. Fluminense - UFF, Niteroi, Brazil
Abstract :
Recognition techniques for printed and handwritten text in scanned documents are significantly different. In this paper we address the problem of identifying each type. We can list at least four steps: digitalization, preprocessing, feature extraction and decision or classification. A new aspect of our approach is the use of data mining techniques on the decision step. A new set of features extracted of each word is proposed as well. Classification rules are mining and used to discern printed text from handwritten. The proposed system was tested in two public image databases. All possible measures of efficiency were computed achieving on every occasion quantities above 80%.
Keywords :
data mining; document image processing; feature extraction; handwritten character recognition; image classification; image segmentation; optical character recognition; text analysis; classification rule mining; data mining; document automatic text discrimination; feature extraction; handwritten text; image classification; printed text; public image databases; scanned documents; text recognition; Character recognition; Classification tree analysis; Computer graphics; Data mining; Feature extraction; Hidden Markov models; Image databases; Image processing; Image segmentation; Optical character recognition software; Data Mining; Machine Vision; document analysis; optical characters recognition; text identification;
Conference_Titel :
Computer Graphics and Image Processing (SIBGRAPI), 2009 XXII Brazilian Symposium on
Conference_Location :
Rio de Janiero
Print_ISBN :
978-1-4244-4978-1
Electronic_ISBN :
1550-1834
DOI :
10.1109/SIBGRAPI.2009.40