A comparison of feature representations for speaker-independent voiced-stop-consonant recognition

Author

Bryant, Benjamin D. ; Gowdy, John N.

Author_Institution

Dept. of Electr. & Comput. Eng., Clemson Univ., SC, USA

fYear

1993

fDate

4-7 Apr 1993

Firstpage

0.75

Abstract

The authors investigated various feature representations of speech which seem to provide robust estimates of machine-recognition-relevant parameters for the voiced-stop-consonant phoneme class. Instances of a block-windowed neural network (BWNN) were trained and tested using feature vectors extracted from data of up to four dialect regions in the TIMIT database. Three feature representations were chosen for use in this research based on their past performance in consulted feature representation studies. It is concluded that the feature representations produced by Seneff´s (1988) auditory model particularly the mean-rate response representation, are good representations for voiced-stop consonant speech as well as vowel speech. It is also concluded that the addition of dynamic feature information in the form of differenced cepstral coefficients to the conglomerate mel-cepstral representative vectors made a difference in the recognition rate for voiced-stop consonants over the use of the mel-frequency cepstral coefficients alone. It can be hypothesized that the use of the BWNN architectures produced better recognition results than the use of other architectures that do not take into account the time and frequency variabilities encountered in utterances from different speakers

Keywords

cepstral analysis; neural nets; signal representation; speech recognition; TIMIT database; block-windowed neural network; dialect regions; differenced cepstral coefficients; dynamic feature information; feature representations; feature vectors; machine-recognition-relevant parameters; mean-rate response representation; mel-cepstral representative vectors; mel-frequency cepstral coefficients; performance; recognition rate; robust estimates; speaker-independent voiced-stop-consonant recognition; voiced-stop consonant speech; vowel speech; Cepstral analysis; Data mining; Feature extraction; Frequency; Neural networks; Parameter estimation; Robustness; Spatial databases; Speech recognition; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

Southeastcon '93, Proceedings., IEEE

Conference_Location

Charlotte, NC

Print_ISBN

0-7803-1257-0

Type

conf

DOI

10.1109/SECON.1993.465782

Filename

465782