Title :
Automatic language identification using large vocabulary continuous speech recognition
Author :
Mendoza, Sergio ; Gillick, Larry ; Ito, Yoshiko ; Lowe, Stephen ; Newman, Michael
Author_Institution :
Dragon Syst. Inc., Newton, MA, USA
Abstract :
We have developed a highly accurate automatic language identification system based on large vocabulary continuous speech recognition (LVCSR). Each test utterance is recognized in a number of languages, and the language ID decision is based on the probability of the output word sequence reported by each recognizer. Recognizers were implemented for this test in English, Japanese, and Spanish, using the Ricardo corpus of telephone monologues. When tested on the OGI corpus of digitally recorded telephone speech, we obtained error rates of 3% or lower on 2-way and 3-way closed-set classification of ten-second and one-minute speech segments
Keywords :
natural languages; pattern classification; speech recognition; 2-way closed-set classification; 3-way closed-set classification; English; Japanese; LVCSR; OGI corpus; Ricardo corpus; Spanish; automatic language identification; automatic language identification system; digitally recorded telephone speech; error rates; language ID decision; large vocabulary continuous speech recognition; output word sequence; speech segments; telephone monologues; test utterance; Data mining; Error analysis; Indium tin oxide; Natural languages; Routing; Speech recognition; Target recognition; Telephony; Testing; Vocabulary;
Conference_Titel :
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
Print_ISBN :
0-7803-3192-3
DOI :
10.1109/ICASSP.1996.543238