• DocumentCode
    1993086
  • Title

    Two approaches for text segmentation in Web images

  • Author

    Karatzas, D. ; Antonacopoulos, A.

  • Author_Institution
    Dept. of Comput. Sci., Liverpool Univ., UK
  • fYear
    2003
  • fDate
    3-6 Aug. 2003
  • Firstpage
    131
  • Abstract
    There is a significant need to recognise the text in images on Web pages, both for effective indexing and for presentation by non-visual means (e.g., audio). This paper presents and compares two novel methods for the segmentation of characters for subsequent extraction and recognition. The novelty of both approaches is the combination of (different in each case) topological features of characters with an anthropocentric perspective of colour perception - in preference to RGB space analysis. Both approaches enable the extraction of text in complex situations such as in the presence of varying colour and texture (characters and background).
  • Keywords
    Web sites; character recognition; document image processing; feature extraction; image segmentation; text analysis; RGB space analysis; Web images; character extraction; character recognition; colour perception; text extraction; text segmentation; Character recognition; Computer science; Image coding; Image recognition; Image resolution; Image segmentation; Indexing; Search engines; Text recognition; Web pages;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2003. Proceedings. Seventh International Conference on
  • Print_ISBN
    0-7695-1960-1
  • Type

    conf

  • DOI
    10.1109/ICDAR.2003.1227646
  • Filename
    1227646