DocumentCode :
1224336
Title :
Unsupervised Discriminative Training With Application to Dialect Classification
Author :
Huang, Rongqing ; Hansen, John H L
Author_Institution :
Nuance Commun., Burlington, MA
Volume :
15
Issue :
8
fYear :
2007
Firstpage :
2444
Lastpage :
2453
Abstract :
Automatic dialect classification has gained interest in the field of speech research because of its importance in characterizing speaker traits and knowledge estimation which could improve integrated speech technology (e.g., speech recognition, speaker recognition). This study addresses novel advances in unsupervised spontaneous dialect classification in English and Spanish. The problem considers the case where no transcripts are available for training and test data, and speakers are talking spontaneously. The Gaussian mixture model (GMM) is used for unsupervised dialect classification in our study. Techniques which aim to deal with confused acoustic regions in the GMMs are proposed, where confused regions in the GMMs are identified through data driven methods. The first technique excludes confused regions by finding dialect dependence in the untranscribed audio by selecting the most discriminative Gaussian mixtures [mixture selection (MS)]. The second technique includes the confused regions in the model, but the confused regions are balanced over all classes. This technique is implemented by identifying discriminative frames and confused frames in the audio data [frame selection (FS)]. The new confused regions contribute to model representation but does not impact classification performance. The third technique is to reduce the confused regions in the original model. Minimum classification error (MCE) is applied to achieve this objective. All three techniques implement discriminative training for GMM-based classification. Both the first technique (MS-GMM, GMM trained with mixture selection) and the second technique (FS-GMM, GMM trained with frame selection) improve dialect classification performance. Further improvement is achieved after applying the third technique (MCE training) before the first or second techniques. The system is evaluated using British English dialects and Latin American Spanish dialects. Measurable improvement is achieved in both corpora. Fina- lly, the system is compared with human listener performance, and shown to outperform human listeners in terms of classification accuracy.
Keywords :
Gaussian processes; error statistics; natural languages; signal classification; speech recognition; unsupervised learning; Gaussian mixture model; confused acoustic region; data driven method; dialect classification; minimum classification error; natural language; speech recognition; unsupervised discriminative training; Acoustic testing; Humans; Loudspeakers; Natural languages; Robustness; Speaker recognition; Speech coding; Speech recognition; System performance; Vocabulary; Accent and dialect; English dialects; Gaussian mixture selection; Spanish dialects; accent classification; automatic dialect classification; discriminative training; frame selection Gaussian mixture model (FS-GMM); minimum classification error (MCE); robust speech recognition;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2007.903302
Filename :
4317564
Link To Document :
بازگشت