Title :
Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping
Author :
Mantena, Gautam ; Achanta, Sivanand ; Prahallad, K.
Author_Institution :
Int. Inst. of Inf. Technol. (IIIT-H), Hyderabad, India
Abstract :
The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α2)) which is faster than NS-DTW.
Keywords :
Gaussian processes; cepstral analysis; frequency-domain analysis; query processing; signal detection; speech recognition; time warp simulation; DTW based algorithm; Gaussian posteriorgrams; Mel-frequency cepstral coefficients; QbE-STD; computational upper bound; fast NS-DTW; frequency domain linear prediction cepstral coefficients; nonsegmental dynamic time warping; perceptual linear prediction cepstral coefficients; query-by-example spoken term detection; speech signal; spoken audio data; spoken query; traditional spectral parameters; Computational modeling; Frequency-domain analysis; Mel frequency cepstral coefficient; Speech; Speech processing; Vectors; Dynamic time warping; fast search; frequency domain linear prediction; query-by-example spoken term detection;
Journal_Title :
Audio, Speech, and Language Processing, IEEE/ACM Transactions on
DOI :
10.1109/TASLP.2014.2311322