مرکز منطقه ای اطلاع رساني علوم و فناوري - Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models

DocumentCode :

2996853

Title :

Phoneme recognition: neural networks vs. hidden Markov models vs. hidden Markov models

Author :

Waibel, A. ; Hanazawa, T. ; Hinton, G. ; Shikano, K. ; Lang, K.

Author_Institution :

ATR Interpreting Telephony Res. Labs., Osaka, Japan

fYear :

1988

fDate :

11-14 Apr 1988

Firstpage :

107

Abstract :

A time-delay neural network (TDNN) for phoneme recognition is discussed. By the use of two hidden layers in addition to an input and output layer it is capable of representing complex nonlinear decision surfaces. Three important properties of the TDNNs have been observed. First, it was able to invent without human interference meaningful linguistic abstractions in time and frequency such as formant tracking and segmentation. Second, it has learned to form alternate representations linking different acoustic events with the same higher level concept. In this fashion it can implement trading relations between lower level acoustic events leading to robust recognition performance despite considerable variability in the input speech. Third, the network is translation-invariant and does not rely on precise alignment or segmentation of the input. The TDNNs performance is compared with the best of hidden Markov models (HMMs) on a speaker-dependent phoneme-recognition task. The TDNN achieved a recognition of 98.5% compared to 93.7% for the HMM, i.e., a fourfold reduction in error

Keywords :

Markov processes; neural nets; speech recognition; acoustic events; alternate representations; complex nonlinear decision surfaces; error; formant tracking; hidden Markov models; meaningful linguistic abstractions; phoneme recognition; robust recognition performance; segmentation; speaker-dependent phoneme-recognition task; speech recognition; time-delay neural network; translation-invariant; Computer networks; Delay effects; Feedforward neural networks; Hidden Markov models; Integrated circuit modeling; Laboratories; Neural networks; Pattern recognition; Speech recognition; Telephony;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on

Conference_Location :

New York, NY

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.1988.196523

Filename :

196523

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2996853