An efficient speaker-independent automatic speech recognition by simulation of some properties of human auditory perception

Author

Hermansky, Hynek

Author_Institution

Speech Technology Laboratory, Santa Barbara, California

Volume

12

fYear

1987

fDate

31868

Firstpage

1159

Lastpage

1162

Abstract

An auditory model of speech perception, the Perceptually based linear predictive analysis with Root power sum metric (PLP-RPS), is applied as the front-end of an automatic speech recognizer (ASR). The PLP-RPS front-end is compared with standard linear predictive-cepstral metric (LP-CEP) front-end, and with LP-RPS and PLP-CEP front-ends. The two-spectral-peak models are the most efficient in modeling of linguistic information in speech. Consequently, in speaker-independent ASR, high analysis order front-ends are less effective than low-order front-ends. Synthetic speech is used for front-end evaluation. Some of perceptual inconsistencies of standard LP front-ends are alleviated in PLP front-ends. The PLP-RPS front-end is most sensitive to harmonic structure of speech spectrum. Perceptual experiments indicate similar tendencies in human auditory perception.

Keywords

Auditory system; Automatic speech recognition; Humans; Laboratories; Natural languages; Power harmonic filters; Predictive models; Speech analysis; Testing; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP '87.

Type

conf

DOI

10.1109/ICASSP.1987.1169803

Filename

1169803