• DocumentCode
    395475
  • Title

    Using prosodic and conversational features for high-performance speaker recognition: report from JHU WS´02

  • Author

    Peskin, Barbara ; Navratil, Jiri ; Abramson, Joy ; Jones, Douglas ; Klusacek, David ; Reynolds, Douglas A. ; Xiang, Bing

  • Volume
    4
  • fYear
    2003
  • fDate
    6-10 April 2003
  • Abstract
    While there has been a long tradition of research seeking to use prosodic features, especially pitch, in speaker recognition systems, results have generally been disappointing when such features are used in isolation and only modest improvements have been seen when used in conjunction with traditional cepstral GMM systems. In contrast, we report here on work from the JHU 2002 Summer Workshop exploring a range of prosodic features, using as testbed the 2001 NIST Extended Data task. We examined a variety of modeling techniques, such as n-gram models of turn-level prosodic features and simple vectors of summary statistics per conversation side scored by kth nearest-neighbor classifiers. We found that purely prosodic models were able to achieve equal error rates of under 10%, and yielded significant gains when combined with more traditional systems. We also report on exploratory work on "conversational" features, capturing properties of the interaction across conversation sides, such as turn-taking patterns.
  • Keywords
    cepstral analysis; error statistics; feature extraction; natural languages; speaker recognition; speech processing; statistical analysis; NIST Extended Data task; cepstral GMM systems; conversation sides; conversation turn-taking patterns; conversational features; equal error rates; high-performance speaker recognition; n-gram models; nearest-neighbor classifiers; pitch; prosodic features; summary statistics; Automatic speech recognition; Cepstral analysis; Computer science; Error analysis; Laboratories; Loudspeakers; Performance evaluation; Speaker recognition; Statistics; System testing;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on
  • ISSN
    1520-6149
  • Print_ISBN
    0-7803-7663-3
  • Type

    conf

  • DOI
    10.1109/ICASSP.2003.1202762
  • Filename
    1202762