Effect of neural network input span on phoneme classification

Author

Kamm, C.A. ; Singhal, S.

fYear

1990

fDate

17-21 June 1990

Firstpage

195

Abstract

Feedforward neural networks spanning input durations of 35, 65, 125, and 245 ns were trained to classify subword speech segments from continuous utterances into 46 phoneme-like classes. The performance of networks with different input spans showed that brief sounds can be reliably detected by networks with longer input spans, but results for a subset of longer-duration phonemes (diphthongs) indicate that a network needs a wide enough view of the input to capture the salient features of the output class. The best classification performance was observed using a completely connected three-layer network with a 125-ns input span. Performance averaged 56%, ranging from 31% to 82% across phoneme classes using a top-three candidates decision criterion

Keywords

neural nets; speech analysis and processing; speech recognition; 35 to 245 ns; continuous utterances; neural network input span; phoneme classification; subword speech segments; three-layer network;

fLanguage

English

Publisher

ieee

Conference_Titel

Neural Networks, 1990., 1990 IJCNN International Joint Conference on

Conference_Location

San Diego, CA, USA

Type

conf

DOI

10.1109/IJCNN.1990.137569

Filename

5726529

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=540203