DocumentCode
384654
Title
Identification and removal of extraneous graphics in a commercial OCR operation
Author
Hashemi, Ray R. ; Epperson, Charlie ; Jones, Steve ; Jin, Lei ; Talburt, John
Author_Institution
Comput. Sci. Dept., Arkansas Univ., Little Rock, AR, USA
Volume
13
fYear
2002
fDate
2002
Firstpage
389
Lastpage
394
Abstract
The major issue in OCRing of a document that is composed of a mixture of text and graphics (i.e. a mixed document) is the presence of graphics in the document. In this research efforts we propose two algorithms for identification and removal of two special types of graphics, namely, company logos and graphic displays with broken boundaries. A prototype is built and its performance evaluated on a test set of 198 scanned images of mixed documents. The prototype was able to remove 100% of the two types of graphics from the images.
Keywords
optical character recognition; broken boundaries; commercial OCR operation; company logos; extraneous graphics identification; extraneous graphics removal; mixed document; Computer graphics; Computer science; Displays; Image analysis; Image enhancement; Optical character recognition software; Pattern recognition; Prototypes; Testing; Text analysis;
fLanguage
English
Publisher
ieee
Conference_Titel
Automation Congress, 2002 Proceedings of the 5th Biannual World
Print_ISBN
1-889335-18-5
Type
conf
DOI
10.1109/WAC.2002.1049574
Filename
1049574
Link To Document