Title :
Shift-invariant features for speech activity detection in adverse radio-frequency channel conditions
Author :
Omar, Mohamed K. ; Ganapathy, Shrikanth
Author_Institution :
IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA
Abstract :
This work presents a novel approach to speech activity detection for highly degraded radio-frequency channel conditions. In this approach, the audio stream is segmented into short homogeneous segments. Each segment is represented by shift-invariant features. These features provide a coarse histogram-based description of the high-energy trajectories in the time-frequency domain. They are less sensitive to frequency shifting compared to traditional filterbank-based features like Mel-Frequency Cepstral Coefficients (MFCC) and Perceptual Linear Prediction (PLP) coefficients. We evaluate our approach on the speech activity detection task of the Robust Automatic Transcription of Speech (RATS) program. Our experiments show improvements up to 29% relative in the performance in terms of total error on four radio-frequency channels used in RATS compared to the PLP-based baseline system.
Keywords :
audio signal processing; cepstral analysis; speech processing; wireless channels; MFCC; PLP; RATS; audio stream; frequency shifting; mel-frequency cepstral coefficients; perceptual linear prediction coefficients; radiofrequency channel conditions; robust automatic transcription of speech; segmental modelling; shift invariant features; speech activity detection; time-frequency domain; Histograms; Rats; Speech; Speech processing; Time-frequency analysis; Training; Training data; invariant features; segmental modeling; speech activity detection;
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on
Conference_Location :
Florence
DOI :
10.1109/ICASSP.2014.6854818