مرکز منطقه ای اطلاع رساني علوم و فناوري - Speech recognition with auxiliary information

DocumentCode :

959761

Title :

Speech recognition with auxiliary information

Author :

Stephenson, Todd A. ; Doss, Mathew Magimai ; Bourlard, Hervé

Author_Institution :

Dalle Molle Inst. for Perceptual Artificial Intelligence, Martigny, Switzerland

Volume :

Issue :

fYear :

2004

fDate :

5/1/2004 12:00:00 AM

Firstpage :

189

Lastpage :

203

Abstract :

State-of-the-art automatic speech recognition (ASR) systems are usually based on hidden Markov models (HMMs) that emit cepstral-based features which are assumed to be piecewise stationary. While not really robust to noise, these features are also known to be very sensitive to "auxiliary" information, such as pitch, energy, rate-of-speech (ROS), etc. Attempts so far to include such auxiliary information in state-of-the-art ASR systems have often been based on simply appending these auxiliary features to the standard acoustic feature vectors. In the present paper, we investigate different approaches to incorporating this auxiliary information using dynamic Bayesian networks (DBNs) or hybrid HMM/ANNs (HMMs with artificial neural networks). These approaches are motivated by the fact that the auxiliary information is not necessarily (directly) emitted by the HMM states but, rather, carries higher-level information (e.g., speaker characteristics) that is correlated with the standard features. As implicitly done for gender modeling elsewhere, this auxiliary information then appears as a conditional variable in the emission distributions and can be hidden (except in the case of some HMM/ANNs) as its estimates become too noisy. Based on recognition experiments carried out on the OGI Numbers database (free format numbers spoken over the telephone), we show that auxiliary information that conditions the distribution of the standard features can, in certain conditions, provide more robust recognition than using auxiliary information that is appended to the standard features; this is most evident in the case of energy as an auxiliary variable in noisy speech.

Keywords :

Gaussian processes; belief networks; cepstral analysis; hidden Markov models; neural nets; speech processing; speech recognition; Gaussian mixture models; OGI numbers database; artificial neural networks; automatic speech recognition system; auxiliary information; cepstral-based features; dynamic Bayesian networks; emissions distributions; hidden Markov models; noisy speech; piecewise stationary; pitch; rate-of-speech; Acoustic emission; Acoustic noise; Artificial neural networks; Automatic speech recognition; Bayesian methods; Hidden Markov models; Noise robustness; Spatial databases; Speech recognition; Telephony;

fLanguage :

English

Journal_Title :

Speech and Audio Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1063-6676

Type :

jour

DOI :

10.1109/TSA.2003.822631

Filename :

1288148

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=959761