• DocumentCode
    712893
  • Title

    AUT-PFT: A real world printed Farsi text image dataset

  • Author

    Torabzadeh, Saeed ; Safabaksh, Reza

  • Author_Institution
    Comput. Eng. Dept., Amirkabir Univ. of Technol., Tehran, Iran
  • fYear
    2015
  • fDate
    3-5 March 2015
  • Firstpage
    267
  • Lastpage
    272
  • Abstract
    A Comprehensive Database of Farsi printed texts is an essential resource for research in this area. Although there are some Arabic printed databases, but those databases do not have all the necessary features for Farsi or Arabic text recognition research. In this paper, we introduce a comprehensive Farsi printed text database called AUT-PFT. The purpose of this database is to provide a large-scale, real world, multi font and multi size corpus for training Farsi or Arabic text recognition systems. This database is made up of 10000 generated words. 127 unique glyphs are used in these words in a way that appearance distribution of glyphs is approximately uniform. These words are generated with 10 widely used Farsi fonts and 4 different font sizes. In order to have real world noise in this database, all generated images were printed and scanned. Ground truth data are also provided for this database and unlike other databases, detailed information about document text is provided at glyph level.
  • Keywords
    character recognition; document image processing; image recognition; natural languages; AUT-PFT; Arabic printed database; Arabic text recognition; Farsi text recognition; glyph level; multifont corpus; multisize corpus; printed Farsi text image dataset; Computers; Databases; Noise; Optical character recognition software; Text recognition; Training; XML; AUT-PFT; Farsi printed text; database; ground truth;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Artificial Intelligence and Signal Processing (AISP), 2015 International Symposium on
  • Conference_Location
    Mashhad
  • Print_ISBN
    978-1-4799-8817-4
  • Type

    conf

  • DOI
    10.1109/AISP.2015.7123490
  • Filename
    7123490