مرکز منطقه ای اطلاع رساني علوم و فناوري

DocumentCode :

2060874

Title :

Extracting text from WWW images

Author :

Zhou, Jiangying ; Lopresti, Daniel

Author_Institution :

Panasonic Inf. & Networking Technol. Lab., Princeton, NJ, USA

Volume :

fYear :

1997

fDate :

18-20 Aug 1997

Firstpage :

248

Abstract :

The authors examine the problem of locating and extracting text from images on the World Wide Web. They describe a text detection algorithm which is based on color clustering and connected component analysis. The algorithm first quantizes the color space of the input image into a number of color classes using a parameter-free clustering procedure. It then identifies text-like connected components in each color class based on their shapes. Finally, a post-processing procedure aligns text-like components into text lines. Experimental results suggest this approach is promising despite the challenging nature of the input data

Keywords :

Internet; document image processing; feature extraction; image colour analysis; WWW images; World Wide Web; color classes; color clustering; color space quantization; connected component analysis; input image; parameter-free clustering procedure; post-processing procedure; shapes; text detection algorithm; text extraction; text lines; text location; text-like component alignment; text-like connected components; Algorithm design and analysis; Clustering algorithms; Data mining; Detection algorithms; Face detection; Image color analysis; Laboratories; Shape; Web sites; World Wide Web;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on

Conference_Location :

Ulm

Print_ISBN :

0-8186-7898-4

Type :

conf

DOI :

10.1109/ICDAR.1997.619850

Filename :

619850

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2060874