DocumentCode
183210
Title
An Historical Handwritten Arabic Dataset for Segmentation-Free Word Spotting - HADARA80P
Author
Pantke, Werner ; Dennhardt, Martin ; Fecker, Daniel ; Margner, Volker ; Fingscheidt, Tim
Author_Institution
Inst. for Commun. Technol., Tech. Univ. Braunschweig, Braunschweig, Germany
fYear
2014
fDate
1-4 Sept. 2014
Firstpage
15
Lastpage
20
Abstract
In this paper, we present a new and freely available dataset comprising 80 pages of an historical handwritten Arabic document in conjunction with a detailed ground truth for the development and evaluation of segmentation-free word spotting approaches. Besides information on the underlying manuscript and technical details, we introduce a comprehensive list of tags that each word is labeled with. These tags can be used for research on specific issues such as dealing with text in different colors. For comparison of different word spotters, a fixed set of 25 keywords with different properties is included. Furthermore, some specifics of spotting on Arabic manuscripts are discussed. We exemplarily present a state-of-the-art word spotting algorithm in its original and a new extended implementation and evaluate both approaches on the new dataset. For comparison, they are also tested on the widely used George Washington dataset. It is shown that the extended word spotter outperforms the original version in terms of mean average precision on both datasets.
Keywords
document image processing; handwritten character recognition; image segmentation; natural language processing; Arabic manuscripts; George Washington dataset; HADARA80P; historical handwritten Arabic document; segmentation-free word spotting; Books; Image color analysis; Image resolution; Image segmentation; Shape; Standards; Writing; dataset; evaluation; historical Arabic handwriting; segmentation-free; word spotting;
fLanguage
English
Publisher
ieee
Conference_Titel
Frontiers in Handwriting Recognition (ICFHR), 2014 14th International Conference on
Conference_Location
Heraklion
ISSN
2167-6445
Print_ISBN
978-1-4799-4335-7
Type
conf
DOI
10.1109/ICFHR.2014.11
Filename
6980990
Link To Document