DocumentCode :
3209221
Title :
Automatic text block separation in document images
Author :
Arvind, K.R. ; Pati, Peeta Basa ; Ramakrishnan, A.G.
Author_Institution :
Indian Inst. of Sci., Bangalore
fYear :
2006
fDate :
Oct. 15 2006-Dec. 18 2006
Firstpage :
53
Lastpage :
58
Abstract :
Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-normalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP´s with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, nearest neighbor, linear discriminant function, SVM´s and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96%.
Keywords :
document image processing; eigenvalues and eigenfunctions; neural nets; pattern classification; support vector machines; text analysis; Fisher profiles; OCR; artificial neural networks; document images; eigen profile; feature classification; length-normalized horizontal projection profile; linear discriminant function; nearest neighbor; printed text blocks separation; printed text recognition; sequential floating feature selection; support vector machine; Artificial neural networks; Data mining; Hidden Markov models; Image analysis; Information analysis; Laboratories; Nearest neighbor searches; Optical character recognition software; Shape; Text recognition; Eigen profiles; Fisher profiles; horizontal projection profile;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Intelligent Sensing and Information Processing, 2006. ICISIP 2006. Fourth International Conference on
Conference_Location :
Bangalore
Print_ISBN :
1-4244-0612-9
Electronic_ISBN :
1-4244-0612-9
Type :
conf
DOI :
10.1109/ICISIP.2006.4286061
Filename :
4286061
Link To Document :
بازگشت