مرکز منطقه ای اطلاع رساني علوم و فناوري - Speaker clustering using vector representation with long-term feature for lecture speech recognition

DocumentCode :

1667973

Title :

Speaker clustering using vector representation with long-term feature for lecture speech recognition

Author :

Chien-Lin Huang ; Hori, Chiori ; Kashioka, Hideki ; Bin Ma

Author_Institution :

Nat. Inst. of Inf. & Commun. Technol., Kyoto, Japan

fYear :

2013

Firstpage :

3532

Lastpage :

3536

Abstract :

Speaker clustering has been widely adopted for clustering the speech data based on acoustic characteristics so that an unsupervised speaker normalization and speaker adaptive training can be applied for a better speech recognition performance. In this study, we present a vector space speaker clustering approach with long-term feature analysis. The supervector based on the GMM mean vectors is adopted to represent the characteristics of speakers. To achieve a robust representation, total variability subspace modeling, which has been successfully applied in speaker recognition for compensating channel and session variability over the GMM mean supervector, is used for speaker clustering. We apply a long-term feature analysis strategy to average short-time spectral features over a period of time to capture the speaker traits that are manifested over a speech segment longer than a spectral frame. Experiments conducted on lecture style speech show that this speaker clustering approach offers a better speech recognition performance.

Keywords :

acoustic signal processing; pattern clustering; speaker recognition; vectors; GMM mean supervector; GMM mean vectors; acoustic characteristics; average short-time spectral features; channel variability compensation; lecture speech recognition; long-term feature analysis; session variability compensation; speaker adaptive training; speaker recognition; speech data clustering; total variability subspace modeling; unsupervised speaker normalization; vector representation; vector space speaker clustering approach; Hidden Markov models; Mel frequency cepstral coefficient; Speech; Speech recognition; Training; Vectors; Speaker clustering; long-term feature; speech recognition; total variability;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6638315

Filename :

6638315

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1667973