Title :
Towards identification of very low resolution, anti-alaised characters
Author :
Einsele, Farshideh ; Hennebert, Jean ; Ingold, Rolf
Author_Institution :
Dept. of Inf., Univ. of Fribourg, Fribourg
Abstract :
Current Web indexing technologies suffer from a severe drawback due to the fact that Web documents often present textual information that is encapsulated in digital images and therefore not available as actual coded text. Moreover such images are not suited to be processed by existing OCR software, since they are generally designed for recognizing binary document images produced by scanners with resolutions between 200-600 dpi, whereas text embedded in web images is often anti-aliased and has generally a resolution between 72 and 90 dpi. The presented paper describes two preliminary studies about character identification at very low resolution (72 dpi) and small font sizes (3-12 pts). The proposed character identification system delivers identification rates up to 99.93% for 12psila600 isolated character samples and up to 99.89% for 300psila000 character samples in context.
Keywords :
Internet; antialiasing; data encapsulation; document image processing; image resolution; indexing; optical character recognition; text analysis; OCR; Web indexing technology; antialaised character identification; binary document image recognition; low resolution character; textual information encapsulation; Bayesian methods; Character recognition; Data mining; Databases; Image resolution; Indexing; Informatics; Optical character recognition software; Rendering (computer graphics); Text recognition;
Conference_Titel :
Signal Processing and Its Applications, 2007. ISSPA 2007. 9th International Symposium on
Conference_Location :
Sharjah
Print_ISBN :
978-1-4244-0778-1
Electronic_ISBN :
978-1-4244-1779-8
DOI :
10.1109/ISSPA.2007.4555324