• DocumentCode
    118100
  • Title

    Discriminative scoring for speaker recognition based on I-vectors

  • Author

    Jun Wang ; Dong Wang ; Ziwei Zhu ; Zheng, Thomas Fang ; Soong, Frank

  • Author_Institution
    Center for Speaker & Language Technol. (CSLT), Tsinghua Univ., Beijing, China
  • fYear
    2014
  • fDate
    9-12 Dec. 2014
  • Firstpage
    1
  • Lastpage
    5
  • Abstract
    The popular i-vector approach to speaker recognition represents a speech segment as an i-vector in a low-dimensional space. It is well known that i-vectors involve both speaker and session variances, and therefore additional discriminative approaches are required to extract speaker information from the `total variance´ space. Among various methods, the probabilistic linear discriminant analysis (PLDA) achieves state-of-the-art performance, partly due to its generative framework that represents speaker and session variances in a hierarchical way. A disadvantage of PLDA, however, lies in its Gaussian assumption of the prior/conditional distributions on the speaker and session variables, which is not necessarily true in reality. This paper presents a discriminative scoring approach which models i-vector pairs using a neural network (NN) so that the posterior probability that an i-vector pair belongs to the same person is read off from the NN output directly. This discriminative approach does not rely on any artificial assumptions on data distributions and can learn speaker-related information with sufficient accuracy provided that the network is large enough and the training data are abundant. Our experiments on the NIST SRE08 interview speech data demonstrated that the NN-based approach outperforms PLDA in the core test condition, and combining the NN and PLDA scores leads to further gains.
  • Keywords
    Gaussian processes; learning (artificial intelligence); neural nets; probability; speaker recognition; vectors; Gaussian assumption; I-vector pair; I-vectors; NIST SRE08 interview speech data; PLDA; data distributions; discriminative scoring; neural network; posterior probability; probabilistic linear discriminant analysis; speaker recognition; speaker-related information; speech segment; training data; Artificial neural networks; Databases; Feature extraction; Speaker recognition; Speech; Training; Vectors;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asia-Pacific Signal and Information Processing Association, 2014 Annual Summit and Conference (APSIPA)
  • Conference_Location
    Siem Reap
  • Type

    conf

  • DOI
    10.1109/APSIPA.2014.7041619
  • Filename
    7041619