DocumentCode :
2182775
Title :
Learning a better representation of speech soundwaves using restricted boltzmann machines
Author :
Jaitly, Navdeep ; Hinton, Geoffrey
Author_Institution :
Dept. of Comput. Sci., Univ. of Toronto, Toronto, ON, Canada
fYear :
2011
fDate :
22-27 May 2011
Firstpage :
5884
Lastpage :
5887
Abstract :
State of the art speech recognition systems rely on preprocessed speech features such as Mel cepstrum or linear predictive coding coefficients that collapse high dimensional speech sound waves into low dimensional encodings. While these have been successfully applied in speech recognition systems, such low dimensional encodings may lose some relevant information and express other information in a way that makes it difficult to use for discrimination. Higher dimensional encodings could both improve performance in recognition tasks, and also be applied to speech synthesis by better modeling the statistical structure of the sound waves. In this paper we present a novel approach for modeling speech sound waves using a Restricted Boltzmann machine (RBM) with a novel type of hidden variable and we report initial results demonstrating phoneme recognition performance better than the current state-of-the-art for methods based on Mel cepstrum coefficients.
Keywords :
Boltzmann machines; speech recognition; speech synthesis; Mel cepstrum coefficient; hidden variable; phoneme recognition; restricted Boltzmann machine; speech recognition system; speech sound wave; speech synthesis; Artificial neural networks; Encoding; Hidden Markov models; Mathematical model; Speech; Speech recognition; Training; RBM; Restricted Boltzmann Machine; TIMIT; phoneme recognition;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing (ICASSP), 2011 IEEE International Conference on
Conference_Location :
Prague
ISSN :
1520-6149
Print_ISBN :
978-1-4577-0538-0
Electronic_ISBN :
1520-6149
Type :
conf
DOI :
10.1109/ICASSP.2011.5947700
Filename :
5947700
Link To Document :
بازگشت