Structured SVMs for Automatic Speech Recognition

Author

Shi-Xiong Zhang ; Gales, Mark J.F.

Author_Institution

Eng. Dept., Cambridge Univ., Cambridge, UK

Volume

21

Issue

3

fYear

2013

fDate

Mar-13

Firstpage

544

Lastpage

555

Abstract

Structured discriminative models are a flexible sequence classification approach that enable a wide variety of features to be used. This paper describes a particular model in this framework, structured support vector machines (SSVM), and how it can be applied to medium to large vocabulary speech recognition tasks. An important aspect of SSVMs is the form of the joint feature spaces. Here, context-dependent generative models, hidden Markov models, are used to obtain the features. To apply this form of combined generative and discriminative model to medium and larger vocabulary tasks, a number of issues need to be addressed. First, the features extracted are a function of the segmentation of the utterance. A Viterbi-like scheme for obtaining the “optimal” segmentation is described. Second, SSVMs can be viewed as large margin log linear models using a zero mean Gaussian prior of the discriminative parameter. However this form of prior is not appropriate for all features. A modified training algorithm is proposed that allows general Gaussian priors to be incorporated into the large margin criterion. Finally to speed up the training process, a 1-slack algorithm, caching competing hypotheses and parallelization strategies are also described. The performance of SSVMs is evaluated on small and medium to large speech recognition tasks: AURORA 2 and 4.

Keywords

Gaussian processes; feature extraction; hidden Markov models; signal classification; speech recognition; support vector machines; vocabulary; 1-slack algorithm; AURORA 2; AURORA 4; Viterbi-like scheme; automatic speech recognition; caching competing hypothesis; combined generative model; context-dependent generative model; feature extraction; flexible sequence classification approach; hidden Markov model; large margin log linear model; optimal segmentation; parallelization strategy; structured SVM; structured discriminative model; structured support vector machines; training algorithm; utterance segmentation; vocabulary speech recognition task; zero mean Gaussian prior; Equations; Hidden Markov models; Joints; Mathematical model; Support vector machines; Training; Vectors; Structured support vector machines; large margin; log linear models;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher

ieee

ISSN

1558-7916

Type

jour

DOI

10.1109/TASL.2012.2227734

Filename

6353551