Title :
Multilingual query by example spoken term detection for under-resourced languages
Author :
Buzo, Andi ; Cucu, H. ; Safta, Mihai ; Burileanu, C.
Author_Institution :
Speech & Dialogue (SpeeD) Res. Lab., Univ. Politeh. of Bucharest, Bucharest, Romania
Abstract :
We propose a query-by-example approach to multilingual Spoken Term Detection for under-resourced languages based on Automatic Speech Recognition. The approach overcomes the main difficulties met under these conditions, i.e., providing a new method for building multilingual acoustic models with few annotated data and searching in approximate Automatic Speech Recognition transcriptions providing high scalability. The acoustic models are obtained by adapting well-trained phonemes to the ones from the envisaged languages. The mapping is made according to International Phonetic Alphabet phoneme classification and a confusion matrix. The weighting of query length and alignment spread are incorporated in the Dynamic Time Warping technique to improve the searching method. Experimental validation was conducted on a standard data set consisting of 3 hours of mixed African languages. The recorded speech has telephonic quality and it is a mix of read and spontaneous speech.
Keywords :
natural language processing; pattern classification; query processing; speech processing; speech recognition; International Phonetic Alphabet phoneme classification; annotated data; approximate automatic speech recognition transcriptions; confusion matrix; dynamic time warping technique; envisaged languages; mixed African languages; multilingual acoustic models; multilingual spoken term detection; query alignment spread; query length weighting; query-by-example approach; read-spontaneous speech mixture; searching method; speech recording; telephonic quality; under-resourced languages; well-trained phonemes; Acoustics; Adaptation models; Data models; Indexing; Speech; Speech processing; multilingual acoustic model; spoken term detection; under-resourced languages;
Conference_Titel :
Speech Technology and Human - Computer Dialogue (SpeD), 2013 7th Conference on
Conference_Location :
Cluj-Napoca
DOI :
10.1109/SpeD.2013.6682655