مرکز منطقه ای اطلاع رساني علوم و فناوري - Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation

DocumentCode :

3422314

Title :

Hierarchical integration of phonetic and lexical knowledge in phone posterior estimation

Author :

Ketabdar, Hamed ; Bourlard, Hervé

Author_Institution :

IDIAP Res. Inst., Martigny

fYear :

2008

fDate :

March 31 2008-April 4 2008

Firstpage :

4065

Lastpage :

4068

Abstract :

Phone posteriors has recently quite often used (as additional features or as local scores) to improve state-of-the-art automatic speech recognition (ASR) systems. Usually, better phone posterior estimates yield better ASR performance. In the present paper we present some initial, yet promising, work towards hierarchically improving these phone posteriors, by implicitly integrating phonetic and lexical knowledge. In the approach investigated here, phone posteriors estimated with a multilayer perceptron (MLP) and short (9 frames) temporal context, are used as input to a second MLP, spanning a longer temporal context (e.g. 19 frames of posteriors) and trained to refine the phone posterior estimates. The rationale behind this is that at the output of every MLP, the information stream is getting simpler (converging to a sequence of binary posterior vectors), and can thus be further processed (using a simpler classifier) by looking at a larger temporal window. Longer term dependencies can be interpreted as phonetic, sub-lexical and lexical knowledge. The resulting enhanced posteriors can then be used for phone and word recognition, in the same way as regular phone posteriors, in hybrid HMM/ANN or Tandem systems. The proposed method has been tested on TIMIT, OGI Numbers and Conversational Telephone Speech (CTS) databases, always resulting in consistent and significant improvements in both phone and word recognition rates.

Keywords :

hidden Markov models; multilayer perceptrons; speech recognition; ANN; Conversational Telephone Speech; HMM; OGI Numbers; TIMIT; automatic speech recognition; binary posterior vectors; hierarchical integration; lexical knowledge; multilayer perceptron; phone posterior estimation; phonetic knowledge; word recognition; Artificial neural networks; Automatic speech recognition; Databases; Hidden Markov models; Multilayer perceptrons; Neural networks; Speech recognition; State estimation; Testing; Yield estimation; Enhanced phone posteriors; Neural Networks; Phone posterior estimation; Phonetic and lexical knowledge; Temporal posterior context;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2008. ICASSP 2008. IEEE International Conference on

Conference_Location :

Las Vegas, NV

ISSN :

1520-6149

Print_ISBN :

978-1-4244-1483-3

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2008.4518547

Filename :

4518547

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3422314