DocumentCode
3192789
Title
A comparison of feature representations for speaker-independent voiced-stop-consonant recognition
Author
Bryant, Benjamin D. ; Gowdy, John N.
Author_Institution
Dept. of Electr. & Comput. Eng., Clemson Univ., SC, USA
fYear
1993
fDate
4-7 Apr 1993
Firstpage
0.75
Abstract
The authors investigated various feature representations of speech which seem to provide robust estimates of machine-recognition-relevant parameters for the voiced-stop-consonant phoneme class. Instances of a block-windowed neural network (BWNN) were trained and tested using feature vectors extracted from data of up to four dialect regions in the TIMIT database. Three feature representations were chosen for use in this research based on their past performance in consulted feature representation studies. It is concluded that the feature representations produced by Seneff´s (1988) auditory model particularly the mean-rate response representation, are good representations for voiced-stop consonant speech as well as vowel speech. It is also concluded that the addition of dynamic feature information in the form of differenced cepstral coefficients to the conglomerate mel-cepstral representative vectors made a difference in the recognition rate for voiced-stop consonants over the use of the mel-frequency cepstral coefficients alone. It can be hypothesized that the use of the BWNN architectures produced better recognition results than the use of other architectures that do not take into account the time and frequency variabilities encountered in utterances from different speakers
Keywords
cepstral analysis; neural nets; signal representation; speech recognition; TIMIT database; block-windowed neural network; dialect regions; differenced cepstral coefficients; dynamic feature information; feature representations; feature vectors; machine-recognition-relevant parameters; mean-rate response representation; mel-cepstral representative vectors; mel-frequency cepstral coefficients; performance; recognition rate; robust estimates; speaker-independent voiced-stop-consonant recognition; voiced-stop consonant speech; vowel speech; Cepstral analysis; Data mining; Feature extraction; Frequency; Neural networks; Parameter estimation; Robustness; Spatial databases; Speech recognition; Testing;
fLanguage
English
Publisher
ieee
Conference_Titel
Southeastcon '93, Proceedings., IEEE
Conference_Location
Charlotte, NC
Print_ISBN
0-7803-1257-0
Type
conf
DOI
10.1109/SECON.1993.465782
Filename
465782
Link To Document