DocumentCode
3569557
Title
Automatic language identification using large vocabulary continuous speech recognition
Author
Mendoza, Sergio ; Gillick, Larry ; Ito, Yoshiko ; Lowe, Stephen ; Newman, Michael
Author_Institution
Dragon Syst. Inc., Newton, MA, USA
Volume
2
fYear
1996
Firstpage
785
Abstract
We have developed a highly accurate automatic language identification system based on large vocabulary continuous speech recognition (LVCSR). Each test utterance is recognized in a number of languages, and the language ID decision is based on the probability of the output word sequence reported by each recognizer. Recognizers were implemented for this test in English, Japanese, and Spanish, using the Ricardo corpus of telephone monologues. When tested on the OGI corpus of digitally recorded telephone speech, we obtained error rates of 3% or lower on 2-way and 3-way closed-set classification of ten-second and one-minute speech segments
Keywords
natural languages; pattern classification; speech recognition; 2-way closed-set classification; 3-way closed-set classification; English; Japanese; LVCSR; OGI corpus; Ricardo corpus; Spanish; automatic language identification; automatic language identification system; digitally recorded telephone speech; error rates; language ID decision; large vocabulary continuous speech recognition; output word sequence; speech segments; telephone monologues; test utterance; Data mining; Error analysis; Indium tin oxide; Natural languages; Routing; Speech recognition; Target recognition; Telephony; Testing; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
ISSN
1520-6149
Print_ISBN
0-7803-3192-3
Type
conf
DOI
10.1109/ICASSP.1996.543238
Filename
543238
Link To Document