• DocumentCode
    395477
  • Title

    Conditional pronunciation modeling in speaker detection

  • Author

    Klusacek, Dalibor ; Navratil, J. ; Reynolds, D.A. ; Campbell, J.P.

  • Volume
    4
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    We present a conditional pronunciation modeling method for the speaker detection task that does not rely on acoustic vectors. Aiming at exploiting higher-level information carried by the speech signal, it uses time-aligned streams of phones and phonemes to model a speaker´s specific pronunciation. Our system uses phonemes drawn from a lexicon of pronunciations of words recognized by an automatic speech recognition system to generate the phoneme stream and an open-loop phone recognizer to generate a phone stream. The phoneme and phone streams are aligned at the frame level and conditional probabilities of a phone, given a phoneme, are estimated using cooccurrence counts. A likelihood detector is then applied to these probabilities. Performance is measured using the NIST Extended Data paradigm and the Switchboard-I corpus. Using 8 training conversations for enrollment, a 2.1% equal error rate was achieved. Extensions and alternatives, as well as fusion experiments, are presented and discussed.
  • Keywords
    error statistics; learning (artificial intelligence); linguistics; natural languages; parameter estimation; probability; speaker recognition; speech processing; NIST Extended Data paradigm; Switchboard-I corpus; acoustic vectors; automatic speech recognition system; conditional pronunciation modeling; cooccurrence counts; equal error rate; higher-level information; likelihood detector; phone stream; phoneme stream; pronunciation lexicon; speaker detection; time-aligned streams; training conversations; Acoustic signal detection; Automatic speech recognition; Detectors; Error analysis; Laboratories; Loudspeakers; Mathematics; Natural languages; Physics; Speaker recognition;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1202765
  • Filename
    1202765