• DocumentCode
    1684073
  • Title

    Automatic language identification in broadcast news

  • Author

    Backfried, Gerhard ; Rainoldi, Rainiero ; Riedler, Jürgen

  • Author_Institution
    Speech, Artificial Intelligence & Language Labs., Vienna, Austria
  • Volume
    2
  • fYear
    2002
  • fDate
    6/24/1905 12:00:00 AM
  • Firstpage
    1406
  • Lastpage
    1410
  • Abstract
    We present experiments on automatic language identification in the broadcast news domain. Because of the inherent diversity of news broadcasts, speech is extracted from the raw audio data by means of phone-level decoding using broad classes of phonemes. Training and testing was performed on recordings of German, English, Spanish and French news shows from a variety of European TV channels. Each language is characterized by a Gaussian mixture model solely created from corresponding acoustic features. The overall average error rate on speech segments is 16.32%. The current system disregards (almost) any kind of linguistic information; however, it is therefore easily extensible to new languages
  • Keywords
    Gaussian distribution; broadcasting; decoding; languages; neural nets; speech processing; English language; European TV channels; French language; Gaussian mixture model; German language; Spanish language; acoustic features; automatic language identification; broadcast news; error rate; neural nets; phone-level decoding; phonemes; raw audio data; recordings; speech extraction; speech segments; television; Acoustic testing; Broadcasting; Context modeling; Data mining; Decoding; Hidden Markov models; Natural languages; Rhythm; Speech; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Neural Networks, 2002. IJCNN '02. Proceedings of the 2002 International Joint Conference on
  • Conference_Location
    Honolulu, HI
  • ISSN
    1098-7576
  • Print_ISBN
    0-7803-7278-6
  • Type

    conf

  • DOI
    10.1109/IJCNN.2002.1007722
  • Filename
    1007722