• DocumentCode
    2884
  • Title

    Combination of Cepstral and Phonetically Discriminative Features for Speaker Verification

  • Author

    Sarkar, A.K. ; Cong-Thanh Do ; Viet-Bac Le ; Barras, Claude

  • Author_Institution
    LIMSI, Univ. Paris-Sud, Orsay, France
  • Volume
    21
  • Issue
    9
  • fYear
    2014
  • fDate
    Sept. 2014
  • Firstpage
    1040
  • Lastpage
    1044
  • Abstract
    Most speaker recognition systems rely on short-term acoustic cepstral features for extracting the speaker-relevant information from the signal. But phonetic discriminant features, extracted by a bottle-neck multi-layer perceptron (MLP) on longer stretches of time, can provide a complementary information and have been adopted in speech transcription systems. We compare the speaker verification performance using cepstral features, discriminant features, and a concatenation of both followed by a dimension reduction. We consider two speaker recognition systems, one based on maximum likelihood linear regression (MLLR) super-vectors and the other on a state-of-the-art i-vector system with two session variability compensation schemes. Experiments are reported on a standard configuration of NIST SRE 2008 and 2010 databases. The results show that the phonetically discriminative MLP features retain speaker-specific information which is complementary to the short-term cepstral features. The performance improvement is obtained with both score domain and feature domain fusion and the speaker verification equal error rate (EER) is reduced up to 50% relative, compared to the best i-vector system using only cepstral features.
  • Keywords
    cepstral analysis; feature extraction; multilayer perceptrons; speaker recognition; speech processing; EER; MLLR super-vectors; NIST SRE 2008-2010 databases; bottle-neck multilayer perceptron; feature domain fusion; i-vector system; maximum likelihood linear regression super-vectors; phonetically discriminative MLP features; phonetically discriminative features; score domain; short-term acoustic cepstral features; speaker recognition systems; speaker verification equal error rate; speaker-relevant information; speech transcription systems; Cepstral analysis; Feature extraction; NIST; Principal component analysis; Speaker recognition; Speech; Vectors; Bottleneck features; LDA; PCA; PLDA; i-vector; multi-layer perceptron; speaker verification;
  • fLanguage
    English
  • Journal_Title
    Signal Processing Letters, IEEE
  • Publisher
    ieee
  • ISSN
    1070-9908
  • Type

    jour

  • DOI
    10.1109/LSP.2014.2323432
  • Filename
    6814811