• DocumentCode
    3209221
  • Title

    Automatic text block separation in document images

  • Author

    Arvind, K.R. ; Pati, Peeta Basa ; Ramakrishnan, A.G.

  • Author_Institution
    Indian Inst. of Sci., Bangalore
  • fYear
    2006
  • fDate
    Oct. 15 2006-Dec. 18 2006
  • Firstpage
    53
  • Lastpage
    58
  • Abstract
    Separation of printed text blocks from the non-text areas, containing signatures, handwritten text, logos and other such symbols, is a necessary first step for an OCR involving printed text recognition. In the present work, we compare the efficacy of some feature-classifier combinations to carry out this separation task. We have selected length-normalized horizontal projection profile (HPP) as the starting point of such a separation task. This is with the assumption that the printed text blocks contain lines of text which generate HPP´s with some regularity. Such an assumption is demonstrated to be valid. Our features are the HPP and its two transformed versions, namely, eigen and Fisher profiles. Four well known classifiers, namely, nearest neighbor, linear discriminant function, SVM´s and artificial neural networks have been considered and efficiency of the combination of these classifiers with the above features is compared. A sequential floating feature selection technique has been adopted to enhance the efficiency of this separation task. The results give an average accuracy of about 96%.
  • Keywords
    document image processing; eigenvalues and eigenfunctions; neural nets; pattern classification; support vector machines; text analysis; Fisher profiles; OCR; artificial neural networks; document images; eigen profile; feature classification; length-normalized horizontal projection profile; linear discriminant function; nearest neighbor; printed text blocks separation; printed text recognition; sequential floating feature selection; support vector machine; Artificial neural networks; Data mining; Hidden Markov models; Image analysis; Information analysis; Laboratories; Nearest neighbor searches; Optical character recognition software; Shape; Text recognition; Eigen profiles; Fisher profiles; horizontal projection profile;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Sensing and Information Processing, 2006. ICISIP 2006. Fourth International Conference on
  • Conference_Location
    Bangalore
  • Print_ISBN
    1-4244-0612-9
  • Electronic_ISBN
    1-4244-0612-9
  • Type

    conf

  • DOI
    10.1109/ICISIP.2006.4286061
  • Filename
    4286061