DocumentCode :
2329947
Title :
Sub-lexical language models for German LVCSR
Author :
El-Desoky Mousa, Amr ; Shaik, M. Ali Basha ; Schlüter, Ralf ; Ney, Hermann
Author_Institution :
Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany
fYear :
2010
fDate :
12-15 Dec. 2010
Firstpage :
171
Lastpage :
176
Abstract :
One of the major difficulties related to German LVCSR is the rich morphology nature of German, leading to high out-of-vocabulary (OOV) rates, and high language model (LM) perplexities. Normally, compound words make up an essential fraction of the German vocabulary. Most compound OOVs are composed of frequent in-vocabulary words. Here, we investigate the use of sub-lexical LMs based on different approaches for word decomposition, namely supervised and unsupervised decomposition, as well as decomposition derived from grapheme-to-phoneme (G2P) conversion. In the later approach, we augment a normal word model with a set of grapheme-phoneme pairs called graphones used to model the OOV words. A novel approach is proposed to select the representative graphone sequences for OOVs based on unsupervised decomposition and word-pronunciation alignment. We obtain relative reductions in word error rate (WER) from 4.2% to 6.5% with respect to a comparable full-words system.
Keywords :
natural language processing; speech recognition; vocabulary; German LVCSR; grapheme-to-phoneme conversion; graphones; high language model perplexities; out-of-vocabulary rates; sublexical language models; German; Speech recognition; graphone; language model; sub-lexical;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language Technology Workshop (SLT), 2010 IEEE
Conference_Location :
Berkeley, CA
Print_ISBN :
978-1-4244-7904-7
Electronic_ISBN :
978-1-4244-7902-3
Type :
conf
DOI :
10.1109/SLT.2010.5700846
Filename :
5700846
Link To Document :
بازگشت