Title :
Effect of neural network input span on phoneme classification
Author :
Kamm, C.A. ; Singhal, S.
Abstract :
Feedforward neural networks spanning input durations of 35, 65, 125, and 245 ns were trained to classify subword speech segments from continuous utterances into 46 phoneme-like classes. The performance of networks with different input spans showed that brief sounds can be reliably detected by networks with longer input spans, but results for a subset of longer-duration phonemes (diphthongs) indicate that a network needs a wide enough view of the input to capture the salient features of the output class. The best classification performance was observed using a completely connected three-layer network with a 125-ns input span. Performance averaged 56%, ranging from 31% to 82% across phoneme classes using a top-three candidates decision criterion
Keywords :
neural nets; speech analysis and processing; speech recognition; 35 to 245 ns; continuous utterances; neural network input span; phoneme classification; subword speech segments; three-layer network;
Conference_Titel :
Neural Networks, 1990., 1990 IJCNN International Joint Conference on
Conference_Location :
San Diego, CA, USA
DOI :
10.1109/IJCNN.1990.137569