DocumentCode
2911489
Title
A New Dataset of Persian Handwritten Documents and Its Segmentation
Author
Alaei, Alireza ; Nagabhushan, P. ; Pal, Umapada
Author_Institution
Dept. of Studies in Comput. Sci., Univ. of Mysore, Mysore, India
fYear
2011
fDate
16-17 Nov. 2011
Firstpage
1
Lastpage
5
Abstract
In document image analysis and especially in handwritten document image recognition, standard datasets play vital roles for evaluating performances of algorithms and comparing results obtained by different groups of researchers. In this paper, an unconstrained Persian handwritten text dataset (PHTD) is introduced. The PHTD contains 140 handwritten documents of three different categories written by 40 individuals. Total number of text-lines and words/subwords in the dataset are 1787 and 27073, respectively. In most of the PHTD documents either an overlapping or a touching text-lines is present. The average number of text-lines in documents of the PHTD is 13. Two types of ground truths based on pixels information and content information are generated for the dataset. Providing these two types of ground truths for the PHTD, it can be utilized in many areas of document image processing such as sentence recognition/understanding, text-line segmentation, word segmentation, word recognition, and character segmentation. To provide a framework for other researches, recent text-line segmentation results on this dataset are also reported.
Keywords
document image processing; handwriting recognition; image segmentation; natural language processing; text analysis; PHTD; Persian handwritten documents; Persian handwritten text dataset; character segmentation; content information; document image analysis; document image processing; handwritten document image recognition; image recognition; image segmentation; pixel information; text line number; text line segmentation; word recognition; word segmentation; Character recognition; Databases; Handwriting recognition; Image recognition; Image segmentation; Text recognition;
fLanguage
English
Publisher
ieee
Conference_Titel
Machine Vision and Image Processing (MVIP), 2011 7th Iranian
Conference_Location
Tehran
Print_ISBN
978-1-4577-1533-4
Type
conf
DOI
10.1109/IranianMVIP.2011.6121553
Filename
6121553
Link To Document