مرکز منطقه ای اطلاع رساني علوم و فناوري - Using Broad Phonetic Group Experts for Improved Speech Recognition

DocumentCode :

1118284

Title :

Using Broad Phonetic Group Experts for Improved Speech Recognition

Author :

Scanlon, Patricia ; Ellis, Daniel P W ; Reilly, Richard B.

Author_Institution :

Univ. Coll. Dublin

Volume :

Issue :

fYear :

2007

fDate :

3/1/2007 12:00:00 AM

Firstpage :

803

Lastpage :

812

Abstract :

In phoneme recognition experiments, it was found that approximately 75% of misclassified frames were assigned labels within the same broad phonetic group (BPG). While the phoneme can be described as the smallest distinguishable unit of speech, phonemes within BPGs contain very similar characteristics and can be easily confused. However, different BPGs, such as vowels and stops, possess very different spectral and temporal characteristics. In order to accommodate the full range of phonemes, acoustic models of speech recognition systems calculate input features from all frequencies over a large temporal context window. A new phoneme classifier is proposed consisting of a modular arrangement of experts, with one expert assigned to each BPG and focused on discriminating between phonemes within that BPG. Due to the different temporal and spectral structure of each BPG, novel feature sets are extracted using mutual information, to select a relevant time-frequency (TF) feature set for each expert. To construct a phone recognition system, the output of each expert is combined with a baseline classifier under the guidance of a separate BPG detector. Considering phoneme recognition experiments using the TIMIT continuous speech corpus, the proposed architecture afforded significant error rate reductions up to 5% relative

Keywords :

error statistics; signal classification; speech recognition; time-frequency analysis; TIMIT continuous speech corpus; acoustic models; baseline classifier; broad phonetic group experts; error rate reductions; phone recognition system; phoneme recognition; speech recognition; temporal context window; time-frequency feature set; Acoustic signal detection; Context modeling; Data mining; Detectors; Error analysis; Feature extraction; Helium; Mutual information; Speech recognition; Time frequency analysis; Automatic speech recognition; broad phonetic groups (BPGs); mixture of experts; mutual information (MI);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2006.885907

Filename :

4100697

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1118284