Query-by-Example Spoken Term Detection using Frequency Domain Linear Prediction and Non-Segmental Dynamic Time Warping

Author

Mantena, Gautam ; Achanta, Sivanand ; Prahallad, K.

Author_Institution

Int. Inst. of Inf. Technol. (IIIT-H), Hyderabad, India

Volume

22

Issue

5

fYear

2014

fDate

May-14

Firstpage

946

Lastpage

955

Abstract

The task of query-by-example spoken term detection (QbE-STD) is to find a spoken query within spoken audio data. Current state-of-the-art techniques assume zero prior knowledge about the language of the audio data, and thus explore dynamic time warping (DTW) based techniques for the QbE-STD task. In this paper, we use a variant of DTW based algorithm referred to as non-segmental DTW (NS-DTW), with a computational upper bound of O (mn) and analyze the performance of QbE-STD with Gaussian posteriorgrams obtained from spectral and temporal features of the speech signal. The results show that frequency domain linear prediction cepstral coefficients, which capture the temporal dynamics of the speech signal, can be used as an alternative to traditional spectral parameters such as linear prediction cepstral coefficients, perceptual linear prediction cepstral coefficients and Mel-frequency cepstral coefficients. We also introduce another variant of NS-DTW called fast NS-DTW (FNS-DTW) which uses reduced feature vectors for search. With a reduction factor of α ∈ ℕ, we show that the computational upper bound for FNS-DTW is O(mn/(α²)) which is faster than NS-DTW.

Keywords

Gaussian processes; cepstral analysis; frequency-domain analysis; query processing; signal detection; speech recognition; time warp simulation; DTW based algorithm; Gaussian posteriorgrams; Mel-frequency cepstral coefficients; QbE-STD; computational upper bound; fast NS-DTW; frequency domain linear prediction cepstral coefficients; nonsegmental dynamic time warping; perceptual linear prediction cepstral coefficients; query-by-example spoken term detection; speech signal; spoken audio data; spoken query; traditional spectral parameters; Computational modeling; Frequency-domain analysis; Mel frequency cepstral coefficient; Speech; Speech processing; Vectors; Dynamic time warping; fast search; frequency domain linear prediction; query-by-example spoken term detection;

fLanguage

English

Journal_Title

Audio, Speech, and Language Processing, IEEE/ACM Transactions on

Publisher

ieee

ISSN

2329-9290

Type

jour

DOI

10.1109/TASLP.2014.2311322

Filename

6763005