Title :
Text and Non-text Segmentation and Classification from Document Images
Author :
Ibrahim, Zaidah ; Isa, Dino ; Rajkumar, Rajprasad
Author_Institution :
Fac. of Inf. Technol. & Quantitative Sci., Univ. Technol. MARA, Shah Alam
Abstract :
Text and non-text segmentation and classification is very important in document layout analysis system before it is presented to an OCR system. Heuristic rules have been used in segmenting and classifying the text and non-text blocks. This research focuses on the classification of non-text block in technical documents into table, graph, and figure. A comparative study is conducted between backpropagation neural network and support vector machine and the result shows that support vector machine classifies better than back propagation neural network.
Keywords :
backpropagation; image classification; image segmentation; neural nets; support vector machines; text analysis; OCR system; backpropagation neural network; document images; document layout analysis system; nontext classification; nontext segmentation; support vector machine; text classification; text segmentation; Backpropagation; Computer science; Image segmentation; Labeling; Neural networks; Pixel; Software engineering; Support vector machine classification; Support vector machines; Text analysis; Backpropagation neural network; non-text segmentation; support vector machine; zoning;
Conference_Titel :
Computer Science and Software Engineering, 2008 International Conference on
Conference_Location :
Wuhan, Hubei
Print_ISBN :
978-0-7695-3336-0
DOI :
10.1109/CSSE.2008.1516