• DocumentCode
    730302
  • Title

    Large-scale speaker search using PLDA on mismatched conditions

  • Author

    Ma, Jeff ; Silovsky, Jan ; Siu, Man-hung ; Kimball, Owen

  • Author_Institution
    Raytheon BBN Technol., Cambridge, MA, USA
  • fYear
    2015
  • fDate
    19-24 April 2015
  • Firstpage
    1846
  • Lastpage
    1850
  • Abstract
    Recent work reported on fast speaker search over large speech data corpora has focused on using locality sensitive hashing (LSH) search with hashing functions approximating i-vector based cosine distances (CosDist) for model comparisons. Because of the superior performance of probabilistic linear discriminant analysis (PLDA) model reported on speaker identification (SID) in recent years, in this paper we focus on using PLDA for fast speaker search. It is challenging to approximate PLDA well with simple hashing functions, resulting in difficulty to combine it with LSH search. As an alternative, we adopt a clustering-based pruning strategy to speed up PLDA search. Our results show the strategy can significantly speed up search with minimal performance loss. Another focus of this work is on PLDA model adaptation to mismatched conditions under which the fast search runs. The technique we adopt to adapt the PLDA model is based on the LDA adaptation method reported in [1], primarily adapting the LDA transform. Our results show this adaptation improves PLDA performance significantly (over 25% relative) on data collected in different conditions. Our speed-up experiments running with adapted LDA show that gains from the adapted PLDA are retained after the speed-up.
  • Keywords
    speaker recognition; statistical analysis; LDA adaptation method; LDA transform; LSH search; PLDA search; SID; clustering-based pruning strategy; hashing functions; i-vector based cosine distances; large speech data corpora; large-scale fast speaker search; locality sensitive hashing search; probabilistic linear discriminant analysis model; speaker identification; Ports (Computers); Speech; Switches; Three-dimensional displays; I-vectors; PLDA; cosine distance; speaker search;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
  • Conference_Location
    South Brisbane, QLD
  • Type

    conf

  • DOI
    10.1109/ICASSP.2015.7178290
  • Filename
    7178290