Title :
Comparison of performance of the features of speech signal for non-intrusive speech quality assessment
Author :
Parmar, Nidhi ; Dubey, Rajesh Kumar
Author_Institution :
Electron. & Commun. Dept., Jaypee Inst. of Inf. Technol., Noida, India
Abstract :
In this work, two different features of speech signal for non-intrusive speech quality assessment has been compared. One based on mel-frequency cepstral coefficients (MFCC) and other based on reconstructed phase spaces (RPS), both are frequently in use for speech recognition system. Focus of the work is to compare the performance of RPS based features with the performance of MFCC based features for non-intrusive speech quality evaluation. MFCC is a close approximation of human auditory system and used in speech quality evaluation. The use of features based on RPS captures the true dynamics of the original speech signal and existence of non-linear characteristics of human speech production system. It replaces the conventionally used power spectrum estimation (PSE) method like FFT by two dimensional DFT methods. The Gaussian Mixture Model (GMM) is used for the mapping of these features to the mean opinion score (MOS). The evaluation of these features has been done for ITU-T Supplement-23 database and the comparison of performance for both of the features has been done in terms of correlation co-efficient between the subjective MOS and the objective MOS.
Keywords :
Gaussian processes; cepstral analysis; discrete Fourier transforms; fast Fourier transforms; mixture models; speech recognition; 2D DFT methods; FFT; Gaussian mixture model; ITU-T Supplement-23 database; MFCC based features; MOS; PSE method; RPS based features; discrete Fourier transform; fast Fourier transform; human auditory system; human speech production system; mean opinion score; mel-frequency cepstral coefficients; nonintrusive speech quality assessment; power spectrum estimation method; reconstructed phase spaces; speech quality evaluation; speech recognition system; speech signal features; Auditory system; Correlation coefficient; Databases; Feature extraction; Mel frequency cepstral coefficient; Quality assessment; Speech; Expectation Maximization; Gaussian Mixture Model; Non-intrusive Speech Quality; Objective MOS; Phase Spase Estimation; Reconstructed Phase Spaces; Subjective MOS;
Conference_Titel :
Signal Processing and Communication (ICSC), 2015 International Conference on
Conference_Location :
Noida
Print_ISBN :
978-1-4799-6760-5
DOI :
10.1109/ICSPCom.2015.7150655