مرکز منطقه ای اطلاع رساني علوم و فناوري - High-performance Query-by-Example Spoken Term Detection on the SWS 2013 evaluation

DocumentCode :

180470

Title :

High-performance Query-by-Example Spoken Term Detection on the SWS 2013 evaluation

Author :

Rodriguez-Fuentes, Luis Javier ; Varona, Amparo ; Penagarikano, Mike ; Bordel, German ; Diez, Mireia

Author_Institution :

Software Technol. Working Group, Univ. of the Basque Country, Leioa, Spain

fYear :

2014

fDate :

4-9 May 2014

Firstpage :

7819

Lastpage :

7823

Abstract :

In the last years, the task of Query-by-Example Spoken Term Detection (QbE-STD), which aims to find occurrences of a spoken query in a set of audio documents, has gained the interest of the research community for its versatility in settings where untranscribed, multilingual and acoustically unconstrained spoken resources, or spoken resources in low-resource languages, must be searched. This paper describes and reports experimental results for a QbE-STD system that achieved the best performance in the recent Spoken Web Search (SWS) evaluation, held as part of MediaEval 2013. Though not optimized for speed, the system operates faster than real-time. The system exploits high-performance phone decoders to extract frame-level phone posteriors (a common representation in QbE-STD tasks). Then, given a query and a audio document, a distance matrix is computed between their phone posterior representations, followed by a newly introduced distance normalization technique and an iterative Dynamic Time Warping (DTW) matching procedure with some heuristic prunings. Results show that remarkable performance improvements can be achieved by using multiple examples per query and, specially, through the late (score-level) fusion of different subsystems, each based on a different set of phone posteriors.

Keywords :

feature extraction; signal detection; speech recognition; DTW matching procedure; MediaEval 2013; QbE-STD system; SWS 2013 evaluation; acoustically unconstrained spoken resources; audio documents; distance normalization technique; frame-level phone posteriors extraction; high-performance phone decoders; high-performance query-by-example spoken term detection; low-resource languages; multilingual spoken resources; phone posterior representations; spoken Web search evaluation; untranscribed spoken resources; Calibration; Conferences; Decoding; Feature extraction; Speech; Vectors; Web search; dynamic time warping; phone posteriorgrams; score calibration and fusion; spoken term detection;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2014 IEEE International Conference on

Conference_Location :

Florence

Type :

conf

DOI :

10.1109/ICASSP.2014.6855122

Filename :

6855122

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=180470