DocumentCode :
1637957
Title :
The GERMANA Database
Author :
Perez, Diego ; Tarazon, L. ; Serrano, N. ; Castro, F. ; Terrades, O. Ramos ; Juan, A.
Author_Institution :
DSIC/ITI, Univ. Politec. de Valencia, Valencia, Spain
fYear :
2009
Firstpage :
301
Lastpage :
305
Abstract :
A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. GERMANA is the result of digitising and annotating a 764-page Spanish manuscript from 1891, in which most pages only contain nearly calligraphed text written on ruled sheets of well-separated lines. To our knowledge, it is the first publicly available database for handwriting research, mostly written in Spanish and comparable in size to standard databases. Due to its sequential book structure, it is also well-suited for realistic assessment of interactive handwriting recognition systems. To provide baseline results for reference in future studies, empirical results are also reported, using standard techniques and tools for preprocessing, feature extraction, HMM-based image modelling, and language modelling.
Keywords :
document image processing; feature extraction; handwriting recognition; hidden Markov models; history; natural languages; text analysis; visual databases; GERMANA database; HMM-based image modelling; Spanish manuscript; calligraphed text; empirical comparison; feature extraction; handwritten text database; historical document collection; image preprocessing technique; interactive off-line handwriting recognition system; language modelling; ruled sheet; sequential book structure; text line extraction; Books; Feature extraction; Handwriting recognition; Hidden Markov models; Image databases; Natural languages; Software libraries; Spatial databases; Text analysis; Text recognition; corpus; datasets; handwriting recognition; historical documents; linguistic knowledge;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 2009. ICDAR '09. 10th International Conference on
Conference_Location :
Barcelona
ISSN :
1520-5363
Print_ISBN :
978-1-4244-4500-4
Electronic_ISBN :
1520-5363
Type :
conf
DOI :
10.1109/ICDAR.2009.10
Filename :
5277691
Link To Document :
بازگشت