DocumentCode
3016010
Title
Continuous speech recognition by means of acoustic/ Phonetic classification obtained from a hidden Markov model
Author
Levinson, S.E.
Author_Institution
AT&T Bell Laboratories, Murray Hill, New Jersey
Volume
12
fYear
1987
fDate
31868
Firstpage
93
Lastpage
96
Abstract
This paper describes an experimental continuous speech recognition system comprising procedures for acoustic/phonetic classification, lexical access and sentence retrieval. Speech is assumed to be composed of a small number of phonetic units which may be identified with the states of a hidden Markov model. The acoustic correlates of the phonetic units are then characterized by the observable Gaussian process associated with the corresponding state of the underlying Markov chain. Once the parameters of such a model are determined, a phonetic transcription of an utterance can be obtained by means of a Viterbi-like algorithm. Given a lexicon in which each entry is orthographically represented in terms of the chosen phonetic units, a word lattice is produced by a lexical access procedure. Lexical items whose orthography matches subsequences of the phonetic transcription are sought by means of a hash coding technique and their likelihoods are computed directly from the corresponding interval of acoustic measurements. The recognition process is completed by recovering from the word lattice, the string of words of maximum likelihood conditioned on the measurements. The desired string is derived by a best-first search algorithm. In an experimental evaluation of the system, the parameters of an acoustic/phonetic model were estimated from fluent utterances of 37 seven-digit numbers. A digit recognition rate of 96% was then observed on an independent test set of 59 utterances of the same form from the same speaker. Half of the observed errors resulted from insertions while deletions and substitutions accounted equally for the other half.
Keywords
Acoustic measurements; Acoustic testing; Gaussian processes; Hidden Markov models; Lattices; Loudspeakers; Maximum likelihood estimation; Pattern recognition; Speech recognition; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87.
Type
conf
DOI
10.1109/ICASSP.1987.1169629
Filename
1169629
Link To Document