DocumentCode :
2630958
Title :
Using n-grams for the definition of a training set for cursive handwriting recognition
Author :
Pflug, Volkmar
Author_Institution :
Siemens AG, Germany
fYear :
1993
fDate :
20-22 Oct 1993
Firstpage :
295
Lastpage :
298
Abstract :
The use of n-grams for the selection of a minimal set of words from a lexicon for use as training words for a handwriting recognizer is presented. The test words selected should cover all or at least most of the graphemes in the section of the language considered. The algorithm reduces the number of test words by up to 70% of the original lexicon size when considering quadgrams. A further reduction is achieved by neglecting rare n-grams. The reduction comes up to 80% for quadgrams. Thus, only 20% of the number of words in the original lexicon have to be trained. Another aspect that may be considered when building the n-grams is that in natural handwriting a word ending Is usually less carefully written than the part of a word. Therefore, n-grams should be longer at the end than at the beginning of a word
Keywords :
document image processing; glossaries; handwriting recognition; optical character recognition; cursive handwriting recognition; graphemes; handwriting recognizer; lexicon; lexicon size; n-grams; natural handwriting; quadgrams; training set; training words; word ending; Character generation; Character recognition; Computer science; Databases; Educational institutions; Handwriting recognition; Natural languages; Speech processing; Testing; Writing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Document Analysis and Recognition, 1993., Proceedings of the Second International Conference on
Conference_Location :
Tsukuba Science City
Print_ISBN :
0-8186-4960-7
Type :
conf
DOI :
10.1109/ICDAR.1993.395728
Filename :
395728
Link To Document :
بازگشت