Title :
Kernel Power Flow Orientation Coefficients for Noise-Robust Speech Recognition
Author :
Gerazov, Branislav ; Ivanovski, Zoran
Author_Institution :
Fac. of Electr. Eng. & Inf. Technol., Univ. of Ss. Cyril & Methodius, Skopje, Macedonia
Abstract :
Noise-robustness has become a crucial parameter in Automatic Speech Recognition (ASR) systems today with their increased use in noise-filled real-world environments. One way to address this issue is to develop features that are innately noise-robust. The Kernel Power flow Orientation Coefficients (KPOCs) are a novel feature set based on spectro-temporal analysis that uses a bank of 2D kernels to extract the dominant orientation of the power flow at each point in the auditory spectrogram of the speech signal. The collection of dominant power flow orientation angles forms a novel representation of the speech signal named the Power flow Orientation Spectrogram (POS), which is innately resistant to the spectral masking introduced by the presence of noise and reverberation. This approach not only grants KPOC its noise robustness, but also keeps the number of output coefficients inherently small, thus eliminating the need of the feature dimensionality reduction otherwise necessary in the conventional the spectro-temporal approach. KPOCs performance has been evaluated on three experimental frameworks, and the results have shown that they outperform a number of well-known noise-robust features for average and low SNRs. The relative improvement in Word Recognition Accuracy (WRA) to the classic Mel Frequency Cepstral Coefficients (MFCCs) for the Aurora 2 task goes from 32% up to 190% for SNRs in the range from 10 down to - 5 dB. The experimental results also show that in clean training the performance of KPOC approaches that of the state-of-the-art noise-robust ASR frontends in all noise scenarios for small vocabulary ASR tasks.
Keywords :
feature extraction; load flow; operating system kernels; reverberation; speech intelligibility; speech recognition; Aurora 2 task; KPOC; MFCC; POS; WRA; auditory spectrogram; automatic speech recognition; dominant orientation; feature dimensionality reduction; feature set; kernel power flow orientation coefficients; mel frequency cepstral coefficients; noise-filled real-world environments; noise-robust features; noise-robust speech recognition; power flow orientation spectrogram; reverberation; spectral masking; spectrotemporal analysis; speech signal; word recognition accuracy; Feature extraction; Filter banks; Gabor filters; Kernel; Load flow; Spectrogram; Speech; 2D kernels; features; noise-robust; spectro-temporal; speech recognition;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2384274