DocumentCode :
1667973
Title :
Speaker clustering using vector representation with long-term feature for lecture speech recognition
Author :
Chien-Lin Huang ; Hori, Chiori ; Kashioka, Hideki ; Bin Ma
Author_Institution :
Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan
fYear :
2013
Firstpage :
3532
Lastpage :
3536
Abstract :
Speaker clustering has been widely adopted for clustering the speech data based on acoustic characteristics so that an unsupervised speaker normalization and speaker adaptive training can be applied for a better speech recognition performance. In this study, we present a vector space speaker clustering approach with long-term feature analysis. The supervector based on the GMM mean vectors is adopted to represent the characteristics of speakers. To achieve a robust representation, total variability subspace modeling, which has been successfully applied in speaker recognition for compensating channel and session variability over the GMM mean supervector, is used for speaker clustering. We apply a long-term feature analysis strategy to average short-time spectral features over a period of time to capture the speaker traits that are manifested over a speech segment longer than a spectral frame. Experiments conducted on lecture style speech show that this speaker clustering approach offers a better speech recognition performance.
Keywords :
acoustic signal processing; pattern clustering; speaker recognition; vectors; GMM mean supervector; GMM mean vectors; acoustic characteristics; average short-time spectral features; channel variability compensation; lecture speech recognition; long-term feature analysis; session variability compensation; speaker adaptive training; speaker recognition; speech data clustering; total variability subspace modeling; unsupervised speaker normalization; vector representation; vector space speaker clustering approach; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition; Training; Vectors; Speaker clustering; long-term feature; speech recognition; total variability;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6638315
Filename :
6638315
Link To Document :
بازگشت