• DocumentCode
    2911489
  • Title

    A New Dataset of Persian Handwritten Documents and Its Segmentation

  • Author

    Alaei, Alireza ; Nagabhushan, P. ; Pal, Umapada

  • Author_Institution
    Dept. of Studies in Comput. Sci., Univ. of Mysore, Mysore, India
  • fYear
    2011
  • fDate
    16-17 Nov. 2011
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    In document image analysis and especially in handwritten document image recognition, standard datasets play vital roles for evaluating performances of algorithms and comparing results obtained by different groups of researchers. In this paper, an unconstrained Persian handwritten text dataset (PHTD) is introduced. The PHTD contains 140 handwritten documents of three different categories written by 40 individuals. Total number of text-lines and words/subwords in the dataset are 1787 and 27073, respectively. In most of the PHTD documents either an overlapping or a touching text-lines is present. The average number of text-lines in documents of the PHTD is 13. Two types of ground truths based on pixels information and content information are generated for the dataset. Providing these two types of ground truths for the PHTD, it can be utilized in many areas of document image processing such as sentence recognition/understanding, text-line segmentation, word segmentation, word recognition, and character segmentation. To provide a framework for other researches, recent text-line segmentation results on this dataset are also reported.
  • Keywords
    document image processing; handwriting recognition; image segmentation; natural language processing; text analysis; PHTD; Persian handwritten documents; Persian handwritten text dataset; character segmentation; content information; document image analysis; document image processing; handwritten document image recognition; image recognition; image segmentation; pixel information; text line number; text line segmentation; word recognition; word segmentation; Character recognition; Databases; Handwriting recognition; Image recognition; Image segmentation; Text recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Machine Vision and Image Processing (MVIP), 2011 7th Iranian
  • Conference_Location
    Tehran
  • Print_ISBN
    978-1-4577-1533-4
  • Type

    conf

  • DOI
    10.1109/IranianMVIP.2011.6121553
  • Filename
    6121553