• DocumentCode
    183210
  • Title

    An Historical Handwritten Arabic Dataset for Segmentation-Free Word Spotting - HADARA80P

  • Author

    Pantke, Werner ; Dennhardt, Martin ; Fecker, Daniel ; Margner, Volker ; Fingscheidt, Tim

  • Author_Institution
    Inst. for Commun. Technol., Tech. Univ. Braunschweig, Braunschweig, Germany
  • fYear
    2014
  • fDate
    1-4 Sept. 2014
  • Firstpage
    15
  • Lastpage
    20
  • Abstract
    In this paper, we present a new and freely available dataset comprising 80 pages of an historical handwritten Arabic document in conjunction with a detailed ground truth for the development and evaluation of segmentation-free word spotting approaches. Besides information on the underlying manuscript and technical details, we introduce a comprehensive list of tags that each word is labeled with. These tags can be used for research on specific issues such as dealing with text in different colors. For comparison of different word spotters, a fixed set of 25 keywords with different properties is included. Furthermore, some specifics of spotting on Arabic manuscripts are discussed. We exemplarily present a state-of-the-art word spotting algorithm in its original and a new extended implementation and evaluate both approaches on the new dataset. For comparison, they are also tested on the widely used George Washington dataset. It is shown that the extended word spotter outperforms the original version in terms of mean average precision on both datasets.
  • Keywords
    document image processing; handwritten character recognition; image segmentation; natural language processing; Arabic manuscripts; George Washington dataset; HADARA80P; historical handwritten Arabic document; segmentation-free word spotting; Books; Image color analysis; Image resolution; Image segmentation; Shape; Standards; Writing; dataset; evaluation; historical Arabic handwriting; segmentation-free; word spotting;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
  • Conference_Location
    Heraklion
  • ISSN
    2167-6445
  • Print_ISBN
    978-1-4799-4335-7
  • Type

    conf

  • DOI
    10.1109/ICFHR.2014.11
  • Filename
    6980990