Acoustic modelling of subword units in the Isadora speech recognizer

Author

Schukat-Talamazzini, E.G. ; Niemann, H. ; Eckert, W. ; Kuhn, T. ; Rieck, S.

Author_Institution

Lehrstuhl fuer Inf., Erlangen Univ., Germany

Volume

1

fYear

1992

fDate

23-26 Mar 1992

Firstpage

577

Abstract

The authors address the choice of suitable subword units for the hidden Markov model (HMM)-based front-end of a speaker-independent large vocabulary continuous speech dialog system (EVAR). In contrast to the well-known approach of using context-dependent phone-like units (for instance generalized triphones) the authors developed inventories of larger-sized subword units, so-called context-freezing units (CFU). CFU models can be considered as an approximation to the extremely desirable situation of having whole word HMMs under the limiting conditions of the training speech data at hand. Recognition experiments indicate an advantage of the context-freezing units over triphone/biphone/phone combinations in terms of the achieved word accuracy, at least in the case of German speech. Using triphones with contexts generalized by means of broad phonetic classes, the authors achieved results comparable to the CFU ones

Keywords

hidden Markov models; speech recognition; speech recognition equipment; EVAR; German speech; Isadora speech recognizer; acoustic modelling; context-freezing units; continuous speech dialog system; generalized triphones; hidden Markov model; speaker independent recognition; subword units; training speech data; word accuracy; Acoustic devices; Contracts; Feature extraction; Frequency; Hidden Markov models; Parameter estimation; Speech recognition; Stability; Training data; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1992. ICASSP-92., 1992 IEEE International Conference on

Conference_Location

San Francisco, CA

ISSN

1520-6149

Print_ISBN

0-7803-0532-9

Type

conf

DOI

10.1109/ICASSP.1992.225843

Filename

225843