Title :
Recognition of isolated words of esophageal speech using GMM and gradient descent RBF networks
Author :
Malathi, P. ; Suresh, G.R.
Author_Institution :
Dept. of ECE, Prathyusha Inst. of Technol. & Manage., Chennai, India
Abstract :
Speech signal can be represented as a combination of acoustic parameters extracted from the speech signal. The parameter vectors are assumed to be the constituents of the speech signal over a specified duration during which it is stationary. Typical representations are Mel Frequency Cepstral Coefficients, Linear Prediction Coefficients etc. The process of isolated word recognition involves the mapping of these parameters with speech but it cannot because there are large variations in the realized speech waveform due to speaker variability, modulation, context, etc. The parametric speech vectors corresponding to each vector is modeled using Gaussian Mixture Model and its distribution is observed. The Expectation Maximisation algorithm is used in the Radial Basis Function network to best fit the test vector. The gradient descent algorithm applied on Radial Basis Function Neural Network is proposed to approximate the functions which have high non-linear order. The learning rates of the network are made proportional to the probability densities obtained from the Gaussian Mixture Model. Isolated words of esophageal speech appear to be recognized better in this method compared to previous methods since it consists of non linear components.
Keywords :
Gaussian processes; cepstral analysis; expectation-maximisation algorithm; gradient methods; mixture models; radial basis function networks; speech recognition; GMM; Gaussian mixture model; acoustic parameters; esophageal speech recognition; expectation maximisation algorithm; gradient descent RBF networks; gradient descent algorithm; isolated word recognition; linear prediction coefficients; mel frequency cepstral coefficients; network learning rate; nonlinear component; nonlinear order; parameter vectors; parametric speech vectors; probability density; radial basis function neural network; realized speech waveform; speaker variability; specified duration; speech signal representation; Feature extraction; Gaussian mixture model; Mel frequency cepstral coefficient; Radial basis function networks; Speech; Speech recognition; Vectors; Expectation Maximisation; Gaussian Mixture Model; Gradient Descent Radial Basis Function; Mel Frequency Cepstral Coefficients;
Conference_Titel :
Communication and Network Technologies (ICCNT), 2014 International Conference on
Print_ISBN :
978-1-4799-6265-5
DOI :
10.1109/CNT.2014.7062749