Title :
Fixed-dimensional acoustic embeddings of variable-length segments in low-resource settings
Author :
Levin, Keith ; Henry, Katharine ; Jansen, Anton ; Livescu, Karen
Author_Institution :
Human Language Technol. Center of Excellence, Johns Hopkins Univ., Baltimore, MD, USA
Abstract :
Measures of acoustic similarity between words or other units are critical for segmental exemplar-based acoustic models, spoken term discovery, and query-by-example search. Dynamic time warping (DTW) alignment cost has been the most commonly used measure, but it has well-known inadequacies. Some recently proposed alternatives require large amounts of training data. In the interest of finding more efficient, accurate, and low-resource alternatives, we consider the problem of embedding speech segments of arbitrary length into fixed-dimensional spaces in which simple distances (such as cosine or Euclidean) serve as a proxy for linguistically meaningful (phonetic, lexical, etc.) dissimilarities. Such embeddings would enable efficient audio indexing and permit application of standard distance learning techniques to segmental acoustic modeling. In this paper, we explore several supervised and unsupervised approaches to this problem and evaluate them on an acoustic word discrimination task. We identify several embedding algorithms that match or improve upon the DTW baseline in low-resource settings.
Keywords :
audio signal processing; distance learning; indexing; query processing; speech recognition; unsupervised learning; DTW; acoustic word discrimination task; audio indexing; distance learning technique; dynamic time warping; fixed-dimensional acoustic embedding; low-resource setting; query-by-example search; segmental exemplar-based acoustic model; speech segment; spoken term discovery; variable-length segment; Acoustics; Laplace equations; Principal component analysis; Speech; Time series analysis; Training; Vectors; Fixed-dimensional embedding; query-by-example search; segmental acoustic modeling; speech indexing;
Conference_Titel :
Automatic Speech Recognition and Understanding (ASRU), 2013 IEEE Workshop on
Conference_Location :
Olomouc
DOI :
10.1109/ASRU.2013.6707765