DocumentCode :
1691573
Title :
Efficient decoding with generative score-spaces using the expectation semiring
Author :
van Dalen, Rogier C. ; Ragni, Anton ; Gales, Mark J.F.
Author_Institution :
Dept. of Eng., Univ. of Cambridge, Cambridge, UK
fYear :
2013
Firstpage :
7619
Lastpage :
7623
Abstract :
State-of-the-art speech recognisers are usually based on hidden Markov models (HMMs). They model a hidden symbol sequence with a Markov process, with the observations independent given that sequence. These assumptions yield efficient algorithms, but limit the power of the model. An alternative model that allows a wide range of features, including word- and phone-level features, is a log-linear model. To handle, for example, word-level variable-length features, the original feature vectors must be segmented into words. Thus, decoding must find the optimal combination of segmentation of the utterance into words and word sequence. Features must therefore be extracted for each possible segment of audio. For many types of features, this becomes slow. In this paper, long-span features are derived from the likelihoods of word HMMs. Derivatives of the log-likelihoods, which break the Markov assumption, are appended. Previously, decoding with this model took cubic time in the length of the sequence, and longer for higher-order derivatives. This paper shows how to decode in quadratic time.
Keywords :
audio signal processing; decoding; feature extraction; hidden Markov models; speech coding; speech recognition; Markov assumption; Markov process; audio segmentation; decoding; expectation semiring; feature extraction; feature vectors; generative score-spaces; hidden Markov models; hidden symbol sequence; higher-order derivatives; log-likelihoods; log-linear model; long-span features; phone-level features; speech recognisers; word HMM; word sequence; word utterance; word-level features; word-level variable-length features; Automata; Computational modeling; Decoding; Feature extraction; Hidden Markov models; Speech; Speech recognition; Speech recognition; expectation semiring; log-linear models; weighted finite-state transducers;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on
Conference_Location :
Vancouver, BC
ISSN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2013.6639145
Filename :
6639145
Link To Document :
بازگشت