DocumentCode
730816
Title
Double-layer neighborhood graph based similarity search for fast query-by-example spoken term detection
Author
Aoyama, Kazuo ; Ogawa, Atsunori ; Hattori, Takashi ; Hori, Takaaki
Author_Institution
NTT Commun. Sci. Labs., NTT Corp., Kyoto, Japan
fYear
2015
fDate
19-24 April 2015
Firstpage
5216
Lastpage
5220
Abstract
This paper presents a novel double-layer neighborhood graph index for acceleration of similarity search that accomplishes fast querybyexample spoken term detection (STD). When a query segment is given, our proposed STD method finds similar segments to the query from an utterance data set by efficient similarity search that traverses the double-layer neighborhood graph (DLG) with a low computational cost. The segment is a sequence of Gaussian mixture model posteriorgram frames and corresponds to a vertex in the DLG. A dissimilarity between vertices is measured by dynamic time warping. The DLG consists of two distinct degree-reduced k-nearest neighbor graphs in a base and an upper layer. The base layer´s graph has all the vertices in the data set while the upper layer´s graph includes only representatives extracted from the vertices in the base layer. By way of analogy, search in the DLG resembles driving on general roads and express highways appropriately for travel-time saving. Experimental results on the MIT lecture corpus demonstrate that the proposed method achieves CPU time reduction by 40% and more than 60% compared to the most recent method and the ordinary graphbased method, keeping almost the same precision.
Keywords
Gaussian processes; graph theory; mixture models; query processing; set theory; speech recognition; CPU time reduction; DLG; Gaussian mixture model posteriorgram frames; MIT lecture corpus; STD method; base layer; degree-reduced k-nearest neighbor graphs; dissimilarity measurement; double-layer neighborhood graph based similarity search; double-layer neighborhood graph index; dynamic time warping; fast query-by-example spoken term detection; low computational cost; query segment; travel-time saving; upper layer; utterance data set; Indexes; Dynamic time warping; Neighborhood graph; Query-by-example search; Search index; Spoken term detection;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on
Conference_Location
South Brisbane, QLD
Type
conf
DOI
10.1109/ICASSP.2015.7178966
Filename
7178966
Link To Document