Efficient decoding with generative score-spaces using the expectation semiring

Author

van Dalen, Rogier C. ; Ragni, Anton ; Gales, Mark J.F.

Author_Institution

Dept. of Eng., Univ. of Cambridge, Cambridge, UK

fYear

2013

Firstpage

7619

Lastpage

7623

Abstract

State-of-the-art speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model that allows a wide range of features, including word- and phone-level features, is a log-linear model. To handle, for example, word-level variable-length features, the original feature vectors must be segmented into words. Thus, decoding must find the optimal combination of segmentation of the utterance into words and word sequence. Features must therefore be extracted for each possible segment of audio. For many types of features, this becomes slow. In this paper, long-span features are derived from the likelihoods of word HMMs. Derivatives of the log-likelihoods, which break the Markov assumption, are appended. Previously, decoding with this model took cubic time in the length of the sequence, and longer for higher-order derivatives. This paper shows how to decode in quadratic time.

Keywords

audio signal processing; decoding; feature extraction; hidden Markov models; speech coding; speech recognition; Markov assumption; Markov process; audio segmentation; decoding; expectation semiring; feature extraction; feature vectors; generative score-spaces; hidden Markov models; hidden symbol sequence; higher-order derivatives; log-likelihoods; log-linear model; long-span features; phone-level features; speech recognisers; word HMM; word sequence; word utterance; word-level features; word-level variable-length features; Automata; Computational modeling; Decoding; Feature extraction; Hidden Markov models; Speech; Speech recognition; Speech recognition; expectation semiring; log-linear models; weighted finite-state transducers;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location

Vancouver, BC

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.2013.6639145

Filename

6639145