Performance comparison of speaker recognition systems in presence of duration variability

Author

Arnab Poddar;Md Sahidullah;Goutam Saha

Author_Institution

Dept of Electronics and Electrical Communication Engineering, Indian Institute of Technology, Kharagpur, India

fYear

2015

Firstpage

1

Lastpage

6

Abstract

Performance of speaker recognition system is highly dependent on the amount of speech data used in training and testing. In this paper, we compare the performance of two different speaker recognition systems in presence of utterance duration variability. The first system is based on state-of-the-art total variability (also known as i-vector system), whereas the other one is classical speaker recognition system based on Gaussian mixture model with universal background model (GMM-UBM). We have conducted extensive experiments for different cases of length mismatch on two NIST corpora: NIST SRE 2008 and NIST SRE 2010. Our study reveals that the relative improvement of total variability based system gradually drops with the reduction in test utterance length. We also observe that if the speakers are enrolled with sufficient amount of training data, GMM-UBM system outperforms i-vector system for very short test utterances.

Keywords

"NIST","Speech","Speaker recognition","Speech recognition","Mel frequency cepstral coefficient","Adaptation models","Training"

Publisher

ieee

Conference_Titel

India Conference (INDICON), 2015 Annual IEEE

Electronic_ISBN

2325-9418

Type

conf

DOI

10.1109/INDICON.2015.7443464

Filename

7443464