Title :
An Historical Handwritten Arabic Dataset for Segmentation-Free Word Spotting - HADARA80P
Author :
Pantke, Werner ; Dennhardt, Martin ; Fecker, Daniel ; Margner, Volker ; Fingscheidt, Tim
Author_Institution :
Inst. for Commun. Technol., Tech. Univ. Braunschweig, Braunschweig, Germany
Abstract :
In this paper, we present a new and freely available dataset comprising 80 pages of an historical handwritten Arabic document in conjunction with a detailed ground truth for the development and evaluation of segmentation-free word spotting approaches. Besides information on the underlying manuscript and technical details, we introduce a comprehensive list of tags that each word is labeled with. These tags can be used for research on specific issues such as dealing with text in different colors. For comparison of different word spotters, a fixed set of 25 keywords with different properties is included. Furthermore, some specifics of spotting on Arabic manuscripts are discussed. We exemplarily present a state-of-the-art word spotting algorithm in its original and a new extended implementation and evaluate both approaches on the new dataset. For comparison, they are also tested on the widely used George Washington dataset. It is shown that the extended word spotter outperforms the original version in terms of mean average precision on both datasets.
Keywords :
document image processing; handwritten character recognition; image segmentation; natural language processing; Arabic manuscripts; George Washington dataset; HADARA80P; historical handwritten Arabic document; segmentation-free word spotting; Books; Image color analysis; Image resolution; Image segmentation; Shape; Standards; Writing; dataset; evaluation; historical Arabic handwriting; segmentation-free; word spotting;
Conference_Titel :
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location :
Heraklion
Print_ISBN :
978-1-4799-4335-7
DOI :
10.1109/ICFHR.2014.11