DocumentCode
712893
Title
AUT-PFT: A real world printed Farsi text image dataset
Author
Torabzadeh, Saeed ; Safabaksh, Reza
Author_Institution
Comput. Eng. Dept., Amirkabir Univ. of Technol., Tehran, Iran
fYear
2015
fDate
3-5 March 2015
Firstpage
267
Lastpage
272
Abstract
A Comprehensive Database of Farsi printed texts is an essential resource for research in this area. Although there are some Arabic printed databases, but those databases do not have all the necessary features for Farsi or Arabic text recognition research. In this paper, we introduce a comprehensive Farsi printed text database called AUT-PFT. The purpose of this database is to provide a large-scale, real world, multi font and multi size corpus for training Farsi or Arabic text recognition systems. This database is made up of 10000 generated words. 127 unique glyphs are used in these words in a way that appearance distribution of glyphs is approximately uniform. These words are generated with 10 widely used Farsi fonts and 4 different font sizes. In order to have real world noise in this database, all generated images were printed and scanned. Ground truth data are also provided for this database and unlike other databases, detailed information about document text is provided at glyph level.
Keywords
character recognition; document image processing; image recognition; natural languages; AUT-PFT; Arabic printed database; Arabic text recognition; Farsi text recognition; glyph level; multifont corpus; multisize corpus; printed Farsi text image dataset; Computers; Databases; Noise; Optical character recognition software; Text recognition; Training; XML; AUT-PFT; Farsi printed text; database; ground truth;
fLanguage
English
Publisher
ieee
Conference_Titel
Artificial Intelligence and Signal Processing (AISP), 2015 International Symposium on
Conference_Location
Mashhad
Print_ISBN
978-1-4799-8817-4
Type
conf
DOI
10.1109/AISP.2015.7123490
Filename
7123490
Link To Document