Title :
RNN and SOM based classifier to recognize assamese fricative sounds designed using frame based temporal feature sets
Author :
Patgiri, Chayashree ; Sarma, M. ; Sarma, Kandarpa Kumar
Author_Institution :
Dept. of Electron. Commun. Eng., Gauhati Univ., Guwahati, India
Abstract :
In this work, a Recurrent Neural Network (RNN) is trained using cepstral features and a set of difference cepstral feature (DCF) vectors on a frame to frame basis. The DCF vector is formulated to capture the temporal patterns of fricative sounds or phonemes of Assamese language. A hybrid algorithm is developed to recognize these fricative phonemes from certain words containing them. To preserve the temporal information of the speech segment, we here consider a frame-based hybrid approach to recognize fricatives from Assamese speech. A hybrid feature set is developed where simple frame-based feature is combined with differential frame-based feature. Investigation of feature extraction techniques like Linear Predictive Cepstral Coefficient (LPCC) and Mel-Frequency Cepstral Coefficient (MFCC) have been carried out and their performances have been evaluated in comparison to that obtained from the DCF set for Assamese fricative phoneme recognition. Here, speech segment is divided into 20 millisecond frames with overlap of 10 millisecond to extract features. Also, difference of a current frame with its preceding and succeeding frame is considered for forming a more accurate dynamic approach for fricative recognition. The differential processing enables to reduce correlation and retain only the most relevant portion of the input. After obtaining the feature vectors, Self Organizing Map (SOM) has been used to categorize the related features into different classes and remove repeating data. Thus the features obtained from the phoneme signal has been reduced into different sized cluster centres provided by SOM. The reduced feature vector is next applied to the RNN based hybrid classifier for learning the pattern and recognizing any unknown fricative segment.
Keywords :
cepstral analysis; feature extraction; pattern classification; recurrent neural nets; self-organising feature maps; speech recognition; DCF set; DCF vectors; LPCC; MFCC; RNN; SOM; assamese fricative phoneme recognition; assamese language; correlation; difference cepstral feature vectors; feature extraction techniques; frame-based hybrid approach; hybrid classifier; hybrid feature set; linear predictive cepstral coefficient; mel-frequency cepstral coefficient; phoneme signal; preceding frame; recurrent neural network; self organizing map; speech segment; succeeding frame; Cepstrum; Feature extraction; Mel frequency cepstral coefficient; Speech; Speech recognition; Vectors;
Conference_Titel :
Neural Networks (IJCNN), 2014 International Joint Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4799-6627-1
DOI :
10.1109/IJCNN.2014.6889792