Title :
Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation
Author :
Ketabdar, Hamed ; Bourlard, Hervé
Author_Institution :
IDIAP Res. Inst., Martigny
fDate :
March 31 2008-April 4 2008
Abstract :
Phone posteriors has recently quite often used (as additional features or as local scores) to improve state-of-the-art automatic speech recognition (ASR) systems. Usually, better phone posterior estimates yield better ASR performance. In the present paper we present some initial, yet promising, work towards hierarchically improving these phone posteriors, by implicitly integrating phonetic and lexical knowledge. In the approach investigated here, phone posteriors estimated with a multilayer perceptron (MLP) and short (9 frames) temporal context, are used as input to a second MLP, spanning a longer temporal context (e.g. 19 frames of posteriors) and trained to refine the phone posterior estimates. The rationale behind this is that at the output of every MLP, the information stream is getting simpler (converging to a sequence of binary posterior vectors), and can thus be further processed (using a simpler classifier) by looking at a larger temporal window. Longer term dependencies can be interpreted as phonetic, sub-lexical and lexical knowledge. The resulting enhanced posteriors can then be used for phone and word recognition, in the same way as regular phone posteriors, in hybrid HMM/ANN or Tandem systems. The proposed method has been tested on TIMIT, OGI Numbers and Conversational Telephone Speech (CTS) databases, always resulting in consistent and significant improvements in both phone and word recognition rates.
Keywords :
hidden Markov models; multilayer perceptrons; speech recognition; ANN; Conversational Telephone Speech; HMM; OGI Numbers; TIMIT; automatic speech recognition; binary posterior vectors; hierarchical integration; lexical knowledge; multilayer perceptron; phone posterior estimation; phonetic knowledge; word recognition; Artificial neural networks; Automatic speech recognition; Databases; Hidden Markov models; Multilayer perceptrons; Neural networks; Speech recognition; State estimation; Testing; Yield estimation; Enhanced phone posteriors; Neural Networks; Phone posterior estimation; Phonetic and lexical knowledge; Temporal posterior context;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on
Conference_Location :
Las Vegas, NV
Print_ISBN :
978-1-4244-1483-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2008.4518547