Title :
Feature bandwidth extension for Persian conversational telephone speech recognition
Author :
Goodarzi, Mohammad Mohsen ; Almasganj, Farshad ; Kabudian, Jahanshah ; Shekofteh, Vasser ; Rezaei, Iman Sarraf
Author_Institution :
Res. Center for Intell. Signal Process. (RCISP), Tehran, Iran
Abstract :
Configuring a whole setup with application of continuous conversational telephony speech recognition in Persian is the goal of this paper. For this propose, two common methods, Gaussian Mixture Model (GMM) and Neural Network (NN) and a proposed hybrid GMM-NN method have been considered to estimate full-bandwidth features from band-limited features. Performances of these methods have been evaluated with two different spectral and cepstral based features, LFBE and MFCC. Also, the effect of speaker gender in estimation process has been investigated. Our results showed that best phoneme recognition accuracy is obtained when MFCC features are reconstructed using two gender dependent neural networks. In this configuration, phoneme accuracy was about 1.6 % more than baseline. The tests were applied on TFarsDat corpus.
Keywords :
Gaussian processes; estimation theory; neural nets; speaker recognition; telephony; Gaussian mixture model; LFBE; MFCC features; Persian continuous conversational telephone speech recognition; TFarsDat corpus; cepstral based features; estimation process; feature bandwidth extension; hybrid GMM-NN method; neural network; phoneme recognition; speaker gender dependent neural networks; spectral based features; Artificial neural networks; Databases; Estimation; Matched filters; Mel frequency cepstral coefficient; Speech; Speech recognition; Gaussian mixture model; conversational telephony speech recognition; feature bandwidth extension; neural network;
Conference_Titel :
Electrical Engineering (ICEE), 2012 20th Iranian Conference on
Conference_Location :
Tehran
Print_ISBN :
978-1-4673-1149-6
DOI :
10.1109/IranianCEE.2012.6292541