Title :
A New Dataset of Persian Handwritten Documents and Its Segmentation
Author :
Alaei, Alireza ; Nagabhushan, P. ; Pal, Umapada
Author_Institution :
Dept. of Studies in Comput. Sci., Univ. of Mysore, Mysore, India
Abstract :
In document image analysis and especially in handwritten document image recognition, standard datasets play vital roles for evaluating performances of algorithms and comparing results obtained by different groups of researchers. In this paper, an unconstrained Persian handwritten text dataset (PHTD) is introduced. The PHTD contains 140 handwritten documents of three different categories written by 40 individuals. Total number of text-lines and words/subwords in the dataset are 1787 and 27073, respectively. In most of the PHTD documents either an overlapping or a touching text-lines is present. The average number of text-lines in documents of the PHTD is 13. Two types of ground truths based on pixels information and content information are generated for the dataset. Providing these two types of ground truths for the PHTD, it can be utilized in many areas of document image processing such as sentence recognition/understanding, text-line segmentation, word segmentation, word recognition, and character segmentation. To provide a framework for other researches, recent text-line segmentation results on this dataset are also reported.
Keywords :
document image processing; handwriting recognition; image segmentation; natural language processing; text analysis; PHTD; Persian handwritten documents; Persian handwritten text dataset; character segmentation; content information; document image analysis; document image processing; handwritten document image recognition; image recognition; image segmentation; pixel information; text line number; text line segmentation; word recognition; word segmentation; Character recognition; Databases; Handwriting recognition; Image recognition; Image segmentation; Text recognition;
Conference_Titel :
Machine Vision and Image Processing (MVIP), 2011 7th Iranian
Conference_Location :
Tehran
Print_ISBN :
978-1-4577-1533-4
DOI :
10.1109/IranianMVIP.2011.6121553