• DocumentCode
    3569557
  • Title

    Automatic language identification using large vocabulary continuous speech recognition

  • Author

    Mendoza, Sergio ; Gillick, Larry ; Ito, Yoshiko ; Lowe, Stephen ; Newman, Michael

  • Author_Institution
    Dragon Syst. Inc., Newton, MA, USA
  • Volume
    2
  • fYear
    1996
  • Firstpage
    785
  • Abstract
    We have developed a highly accurate automatic language identification system based on large vocabulary continuous speech recognition (LVCSR). Each test utterance is recognized in a number of languages, and the language ID decision is based on the probability of the output word sequence reported by each recognizer. Recognizers were implemented for this test in English, Japanese, and Spanish, using the Ricardo corpus of telephone monologues. When tested on the OGI corpus of digitally recorded telephone speech, we obtained error rates of 3% or lower on 2-way and 3-way closed-set classification of ten-second and one-minute speech segments
  • Keywords
    natural languages; pattern classification; speech recognition; 2-way closed-set classification; 3-way closed-set classification; English; Japanese; LVCSR; OGI corpus; Ricardo corpus; Spanish; automatic language identification; automatic language identification system; digitally recorded telephone speech; error rates; language ID decision; large vocabulary continuous speech recognition; output word sequence; speech segments; telephone monologues; test utterance; Data mining; Error analysis; Indium tin oxide; Natural languages; Routing; Speech recognition; Target recognition; Telephony; Testing; Vocabulary;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 1996. ICASSP-96. Conference Proceedings., 1996 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-3192-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.1996.543238
  • Filename
    543238