Title :
Non-intrusive speech quality assessment using multi-resolution auditory model features for degraded narrowband speech
Author :
Dubey, Rajesh Kumar ; Kumar, Arun
Author_Institution :
Center for Appl. Res. in Electron., Indian Inst. of Technol.-Delhi, New Delhi, India
Abstract :
A multi-resolution framework using auditory perception-based wavelet packet transform is invoked in multi-resolution auditory model (MRAM) and used for non-intrusive objective speech quality estimation. The MRAM provides a detailed time-frequency modelling of the human auditory system compared to earlier models that have been used for non-intrusive speech quality estimation. The objective Mean Opinion Score (MOS) of a degraded narrowband speech utterance has been estimated by Gaussian Mixture Model (GMM) probabilistic approach using MRAM-based feature vector. Additionally, a recent auditory model (Lyons´ auditory model) based features, mel-frequency cepstral coefficients (MFCC), and line spectral frequencies (LSF) features have also been used independently for comparison of the performance of MRAM features. The combination of MFCC and LSF features with MRAM features for non-intrusive speech quality estimation using GMM probabilistic approach has been proposed and investigated. The performance of these feature vectors has been evaluated and compared with ITU-T Recommendation P.563 and a recent published work by computing correlation coefficient and root-mean-square error between the subjective MOS and the estimated objective MOS. It is found that the proposed method that uses a combination of MRAM features, MFCC, and LSF feature vectors for non-intrusive speech quality performs better than both the other algorithms.
Keywords :
Gaussian processes; cepstral analysis; feature extraction; mixture models; probability; speech processing; wavelet transforms; GMM probabilistic approach; Gaussian mixture model probabilistic approach; Lyons auditory model; MFCC features; MRAM-based feature vector; auditory perception-based wavelet packet transform; degraded narrowband speech utterance; human auditory system; line spectral frequencies features; mel-frequency cepstral coefficients; multiresolution auditory model features; nonintrusive objective speech quality estimation; nonintrusive speech quality assessment; objective mean opinion score; time-frequency modelling;
Journal_Title :
Signal Processing, IET
DOI :
10.1049/iet-spr.2014.0214