DocumentCode :
2060874
Title :
Extracting text from WWW images
Author :
Zhou, Jiangying ; Lopresti, Daniel
Author_Institution :
Panasonic Inf. & Networking Technol. Lab., Princeton, NJ, USA
Volume :
1
fYear :
1997
fDate :
18-20 Aug 1997
Firstpage :
248
Abstract :
The authors examine the problem of locating and extracting text from images on the World Wide Web. They describe a text detection algorithm which is based on color clustering and connected component analysis. The algorithm first quantizes the color space of the input image into a number of color classes using a parameter-free clustering procedure. It then identifies text-like connected components in each color class based on their shapes. Finally, a post-processing procedure aligns text-like components into text lines. Experimental results suggest this approach is promising despite the challenging nature of the input data
Keywords :
Internet; document image processing; feature extraction; image colour analysis; WWW images; World Wide Web; color classes; color clustering; color space quantization; connected component analysis; input image; parameter-free clustering procedure; post-processing procedure; shapes; text detection algorithm; text extraction; text lines; text location; text-like component alignment; text-like connected components; Algorithm design and analysis; Clustering algorithms; Data mining; Detection algorithms; Face detection; Image color analysis; Laboratories; Shape; Web sites; World Wide Web;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1997., Proceedings of the Fourth International Conference on
Conference_Location :
Ulm
Print_ISBN :
0-8186-7898-4
Type :
conf
DOI :
10.1109/ICDAR.1997.619850
Filename :
619850
Link To Document :
بازگشت