• DocumentCode
    3414565
  • Title

    Text Extraction from Document Images Using Edge Information

  • Author

    Grover, Sachin ; Arora, Kushal ; Mitra, Suman K.

  • Author_Institution
    Nat. Inst. of Technol., Rourkela, India
  • fYear
    2009
  • fDate
    18-20 Dec. 2009
  • Firstpage
    1
  • Lastpage
    4
  • Abstract
    Detection of text from documents in which text is embedded in complex colored document images is a very challenging problem. There are a lot of potential uses of text extraction in image searching, archiving documents etc. In this paper, we propose a simple edge based feature to perform this task. It aims at detecting textual regions from the document and separating it from the graphics portion. The algorithm is based on the sharp edges of the characters which are missing in images. We find these edges and use them to classify text from images. This edge information can also be used for other image interpretation tasks.
  • Keywords
    character recognition; document image processing; edge detection; feature extraction; image classification; image colour analysis; image texture; text analysis; character sharp edge; colored document images; document archiving; edge based feature; edge information; image interpretation; image searching; text classification; text extraction; textual region detection; texture measure; Communications technology; Data mining; Fourier transforms; Image edge detection; Image segmentation; Indexing; Material storage; Merging; Periodic structures; Robustness;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    India Conference (INDICON), 2009 Annual IEEE
  • Conference_Location
    Gujarat
  • Print_ISBN
    978-1-4244-4858-6
  • Electronic_ISBN
    978-1-4244-4859-3
  • Type

    conf

  • DOI
    10.1109/INDCON.2009.5409409
  • Filename
    5409409