مرکز منطقه ای اطلاع رساني علوم و فناوري - Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings

DocumentCode :

672388

Title :

Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings

Author :

Levin, Keith ; Henry, Katharine ; Jansen, Anton ; Livescu, Karen

Author_Institution :

Human Language Technol. Center of Excellence, Johns Hopkins Univ., Baltimore, MD, USA

fYear :

2013

fDate :

8-12 Dec. 2013

Firstpage :

410

Lastpage :

415

Abstract :

Measures of acoustic similarity between words or other units are critical for segmental exemplar-based acoustic models, spoken term discovery, and query-by-example search. Dynamic time warping (DTW) alignment cost has been the most commonly used measure, but it has well-known inadequacies. Some recently proposed alternatives require large amounts of training data. In the interest of finding more efficient, accurate, and low-resource alternatives, we consider the problem of embedding speech segments of arbitrary length into fixed-dimensional spaces in which simple distances (such as cosine or Euclidean) serve as a proxy for linguistically meaningful (phonetic, lexical, etc.) dissimilarities. Such embeddings would enable efficient audio indexing and permit application of standard distance learning techniques to segmental acoustic modeling. In this paper, we explore several supervised and unsupervised approaches to this problem and evaluate them on an acoustic word discrimination task. We identify several embedding algorithms that match or improve upon the DTW baseline in low-resource settings.

Keywords :

audio signal processing; distance learning; indexing; query processing; speech recognition; unsupervised learning; DTW; acoustic word discrimination task; audio indexing; distance learning technique; dynamic time warping; fixed-dimensional acoustic embedding; low-resource setting; query-by-example search; segmental exemplar-based acoustic model; speech segment; spoken term discovery; variable-length segment; Acoustics; Laplace equations; Principal component analysis; Speech; Time series analysis; Training; Vectors; Fixed-dimensional embedding; query-by-example search; segmental acoustic modeling; speech indexing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on

Conference_Location :

Olomouc

Type :

conf

DOI :

10.1109/ASRU.2013.6707765

Filename :

6707765

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=672388