• DocumentCode
    1634561
  • Title

    A New Arabic Printed Text Image Database and Evaluation Protocols

  • Author

    Slimane, Fouad ; Ingold, Rolf ; Kanoun, Slim ; Alimi, Adel M. ; Hennebert, Jean

  • Author_Institution
    Dept. of Inf., Univ. of Fribourg, Fribourg, Switzerland
  • fYear
    2009
  • Firstpage
    946
  • Lastpage
    950
  • Abstract
    We report on the creation of a database composed of images of Arabic Printed words. The purpose of this database is the large-scale benchmarking of open-vocabulary, multi-font, multi-size and multi-style text recognition systems in Arabic. The challenges that are addressed by the database are in the variability of the sizes, fonts and style used to generate the images. A focus is also given on low-resolution images where anti-aliasing is generating noise on the characters to recognize. The database is synthetically generated using a lexicon of 113psila284 words, 10 Arabic fonts, 10 font sizes and 4 font styles. The database contains 45psila313psila600 single word images totaling to more than 250 million characters. Ground truth annotation is provided for each image. The database is called APTI for Arabic Printed Text Images.
  • Keywords
    benchmark testing; image recognition; image resolution; natural language processing; protocols; text analysis; visual databases; Arabic printed text image database; Arabic printed text image evaluation protocols; ground truth annotation; large-scale benchmarking; low-resolution images; text recognition systems; Character generation; Character recognition; Focusing; Image databases; Image generation; Image recognition; Large-scale systems; Noise generators; Protocols; Text recognition; Arabic Text Recognition System; OCR; benchmarking; text image databases;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
  • Conference_Location
    Barcelona
  • ISSN
    1520-5363
  • Print_ISBN
    978-1-4244-4500-4
  • Electronic_ISBN
    1520-5363
  • Type

    conf

  • DOI
    10.1109/ICDAR.2009.155
  • Filename
    5277558