Automatic language identification using large vocabulary continuous speech recognition

Author

Mendoza, Sergio ; Gillick, Larry ; Ito, Yoshiko ; Lowe, Stephen ; Newman, Michael

Author_Institution

Dragon Syst. Inc., Newton, MA, USA

Volume

2

fYear

1996

Firstpage

785

Abstract

We have developed a highly accurate automatic language identification system based on large vocabulary continuous speech recognition (LVCSR). Each test utterance is recognized in a number of languages, and the language ID decision is based on the probability of the output word sequence reported by each recognizer. Recognizers were implemented for this test in English, Japanese, and Spanish, using the Ricardo corpus of telephone monologues. When tested on the OGI corpus of digitally recorded telephone speech, we obtained error rates of 3% or lower on 2-way and 3-way closed-set classification of ten-second and one-minute speech segments

Keywords

natural languages; pattern classification; speech recognition; 2-way closed-set classification; 3-way closed-set classification; English; Japanese; LVCSR; OGI corpus; Ricardo corpus; Spanish; automatic language identification; automatic language identification system; digitally recorded telephone speech; error rates; language ID decision; large vocabulary continuous speech recognition; output word sequence; speech segments; telephone monologues; test utterance; Data mining; Error analysis; Indium tin oxide; Natural languages; Routing; Speech recognition; Target recognition; Telephony; Testing; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on

ISSN

1520-6149

Print_ISBN

0-7803-3192-3

Type

conf

DOI

10.1109/ICASSP.1996.543238

Filename

543238