• DocumentCode
    384654
  • Title

    Identification and removal of extraneous graphics in a commercial OCR operation

  • Author

    Hashemi, Ray R. ; Epperson, Charlie ; Jones, Steve ; Jin, Lei ; Talburt, John

  • Author_Institution
    Comput. Sci. Dept., Arkansas Univ., Little Rock, AR, USA
  • Volume
    13
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    389
  • Lastpage
    394
  • Abstract
    The major issue in OCRing of a document that is composed of a mixture of text and graphics (i.e. a mixed document) is the presence of graphics in the document. In this research efforts we propose two algorithms for identification and removal of two special types of graphics, namely, company logos and graphic displays with broken boundaries. A prototype is built and its performance evaluated on a test set of 198 scanned images of mixed documents. The prototype was able to remove 100% of the two types of graphics from the images.
  • Keywords
    optical character recognition; broken boundaries; commercial OCR operation; company logos; extraneous graphics identification; extraneous graphics removal; mixed document; Computer graphics; Computer science; Displays; Image analysis; Image enhancement; Optical character recognition software; Pattern recognition; Prototypes; Testing; Text analysis;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Automation Congress, 2002 Proceedings of the 5th Biannual World
  • Print_ISBN
    1-889335-18-5
  • Type

    conf

  • DOI
    10.1109/WAC.2002.1049574
  • Filename
    1049574