Exemplar-based large vocabulary speech recognition using k-nearest neighbors

Author

Yanbo Xu ; Siohan, Olivier ; Simcha, David ; Kumar, Sanjiv ; Liao, Hank

Author_Institution

Dept. of Electr. & Comput. Eng., Univ. of Maryland Coll. Park, College Park, MD, USA

fYear

2015

fDate

19-24 April 2015

Firstpage

5167

Lastpage

5171

Abstract

This paper describes a large scale exemplar-based acoustic modeling approach for large vocabulary continuous speech recognition. We construct an index of labeled training frames using high-level features extracted from the bottleneck layer of a deep neural network as indexing features. At recognition time, each test frame is turned into a query and a set of k-nearest neighbor frames is retrieved from the index. This set is further filtered using majority voting and the remaining frames are used to derive an estimate of the context-dependent state posteriors of the query, which can then be used for recognition. Using an approximate nearest neighbor search approach based on asymmetric hashing, we are able to construct an index on over 25,000 hours of training data. We present both frame classification and recognition experiments on a Voice Search task.

Keywords

feature extraction; file organisation; neural nets; speech recognition; vocabulary; voice equipment; acoustic modeling; asymmetric hashing; context-dependent state posteriors; deep neural network; feature extraction; k-nearest neighbor; recognition time; vocabulary speech recognition; voice search task; Electronic publishing; Indexes; Information services; Market research; Speech recognition; Training; Vocabulary; acoustic modeling; deep neural network; exemplar-based recognition; k-Nearest Neighbor;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech and Signal Processing (ICASSP), 2015 IEEE International Conference on

Conference_Location

South Brisbane, QLD

Type

conf

DOI

10.1109/ICASSP.2015.7178956

Filename

7178956