DocumentCode
3209221
Title
Automatic text block separation in document images
Author
Arvind, K.R. ; Pati, Peeta Basa ; Ramakrishnan, A.G.
Author_Institution
Indian Inst. of Sci., Bangalore
fYear
2006
fDate
Oct. 15 2006-Dec. 18 2006
Firstpage
53
Lastpage
58
Abstract
Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-normalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP´s with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, nearest neighbor, linear discriminant function, SVM´s and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96%.
Keywords
document image processing; eigenvalues and eigenfunctions; neural nets; pattern classification; support vector machines; text analysis; Fisher profiles; OCR; artificial neural networks; document images; eigen profile; feature classification; length-normalized horizontal projection profile; linear discriminant function; nearest neighbor; printed text blocks separation; printed text recognition; sequential floating feature selection; support vector machine; Artificial neural networks; Data mining; Hidden Markov models; Image analysis; Information analysis; Laboratories; Nearest neighbor searches; Optical character recognition software; Shape; Text recognition; Eigen profiles; Fisher profiles; horizontal projection profile;
fLanguage
English
Publisher
ieee
Conference_Titel
Intelligent Sensing and Information Processing, 2006. ICISIP 2006. Fourth International Conference on
Conference_Location
Bangalore
Print_ISBN
1-4244-0612-9
Electronic_ISBN
1-4244-0612-9
Type
conf
DOI
10.1109/ICISIP.2006.4286061
Filename
4286061
Link To Document