DocumentCode :
67973
Title :
Structured SVMs for Automatic Speech Recognition
Author :
Shi-Xiong Zhang ; Gales, Mark J.F.
Author_Institution :
Eng. Dept., Cambridge Univ., Cambridge, UK
Volume :
21
Issue :
3
fYear :
2013
fDate :
Mar-13
Firstpage :
544
Lastpage :
555
Abstract :
Structured discriminative models are a flexible sequence classification approach that enable a wide variety of features to be used. This paper describes a particular model in this framework, structured support vector machines (SSVM), and how it can be applied to medium to large vocabulary speech recognition tasks. An important aspect of SSVMs is the form of the joint feature spaces. Here, context-dependent generative models, hidden Markov models, are used to obtain the features. To apply this form of combined generative and discriminative model to medium and larger vocabulary tasks, a number of issues need to be addressed. First, the features extracted are a function of the segmentation of the utterance. A Viterbi-like scheme for obtaining the “optimal” segmentation is described. Second, SSVMs can be viewed as large margin log linear models using a zero mean Gaussian prior of the discriminative parameter. However this form of prior is not appropriate for all features. A modified training algorithm is proposed that allows general Gaussian priors to be incorporated into the large margin criterion. Finally to speed up the training process, a 1-slack algorithm, caching competing hypotheses and parallelization strategies are also described. The performance of SSVMs is evaluated on small and medium to large speech recognition tasks: AURORA 2 and 4.
Keywords :
Gaussian processes; feature extraction; hidden Markov models; signal classification; speech recognition; support vector machines; vocabulary; 1-slack algorithm; AURORA 2; AURORA 4; Viterbi-like scheme; automatic speech recognition; caching competing hypothesis; combined generative model; context-dependent generative model; feature extraction; flexible sequence classification approach; hidden Markov model; large margin log linear model; optimal segmentation; parallelization strategy; structured SVM; structured discriminative model; structured support vector machines; training algorithm; utterance segmentation; vocabulary speech recognition task; zero mean Gaussian prior; Equations; Hidden Markov models; Joints; Mathematical model; Support vector machines; Training; Vectors; Structured support vector machines; large margin; log linear models;
fLanguage :
English
Journal_Title :
Audio, Speech, and Language Processing, IEEE Transactions on
Publisher :
ieee
ISSN :
1558-7916
Type :
jour
DOI :
10.1109/TASL.2012.2227734
Filename :
6353551
Link To Document :
بازگشت