• DocumentCode
    2021991
  • Title

    Annotated Databases for the Recognition of Screen-Rendered Text

  • Author

    Wachenfeld, Steffen ; Klein, Hans-Ulrich ; Jiang, Xiaoyi

  • Author_Institution
    Univ. of Munster, Munster
  • Volume
    1
  • fYear
    2007
  • fDate
    23-26 Sept. 2007
  • Firstpage
    272
  • Lastpage
    276
  • Abstract
    The recognition of screen-rendered text is a novel task. It is performed e.g. by translation tools which allow users to click on any text on the screen and give a translation. Also some commercial OCR programs start to address the problem of reading screenshots. Optical character recognition on screen-shot images can be very challenging due to very small and smoothed fonts. In order to build and compare recognition approaches for screen-rendered text, the availability of standard databases is a fundamental prerequisite. In this paper two freely available databases are presented, one that consists of annotated screenshot images of 28080 single characters and another holding 400 words extracted from documents plus 2 400 generated isolated words. Both databases include meta-information such as x-height, font type, style and rendering conditions. At the example of a developed recognition system, it is shown how these databases can serve for training, testing and optimization.
  • Keywords
    optical character recognition; text analysis; visual databases; OCR programs; annotated databases; optical character recognition; screen-rendered text recognition; screen-shot images; translation tools; Application software; Character generation; Character recognition; Image databases; Operating systems; Optical character recognition software; Pattern recognition; Rendering (computer graphics); System testing; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2007. ICDAR 2007. Ninth International Conference on
  • Conference_Location
    Parana
  • ISSN
    1520-5363
  • Print_ISBN
    978-0-7695-2822-9
  • Type

    conf

  • DOI
    10.1109/ICDAR.2007.4378718
  • Filename
    4378718