Title :
Two approaches for text segmentation in Web images
Author :
Karatzas, D. ; Antonacopoulos, A.
Author_Institution :
Dept. of Comput. Sci., Liverpool Univ., UK
Abstract :
There is a significant need to recognise the text in images on Web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception - in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background).
Keywords :
Web sites; character recognition; document image processing; feature extraction; image segmentation; text analysis; RGB space analysis; Web images; character extraction; character recognition; colour perception; text extraction; text segmentation; Character recognition; Computer science; Image coding; Image recognition; Image resolution; Image segmentation; Indexing; Search engines; Text recognition; Web pages;
Conference_Titel :
Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
Print_ISBN :
0-7695-1960-1
DOI :
10.1109/ICDAR.2003.1227646