مرکز منطقه ای اطلاع رساني علوم و فناوري - Feature and score level combination of subspace Gaussinas in LVCSR task

DocumentCode :

1691526

Title :

Feature and score level combination of subspace Gaussinas in LVCSR task

Author :

Motlicek, Petr ; Povey, Daniel ; Karafiat, Martin

Author_Institution :

Idiap Res. Inst., Martigny, Switzerland

fYear :

2013

Firstpage :

7604

Lastpage :

7608

Abstract :

In this paper, we investigate employment of discriminatively trained acoustic features modeled by Subspace Gaussian Mixture Models (SGMMs) for Rich Transcription meeting recognition. More specifically, first, we focus on exploiting various types of complex features estimated using neural network combined with conventional cepstral features and modeled by standard HMM/GMMs and SGMMs. Then, outputs (word sequences) from individual recognizers trained using different features are also combined on a score-level using ROVER for the both acoustic modeling techniques. Experimental results indicate three important findings: (1) SGMMs consistently outperform HMM/GMMs (relative improvement on average by about 6% in terms of WER) when both techniques are exploited on single features; (2) SGMMs benefit much less from feature-level combination (1% relative improvement) as opposed to HMM/GMMs (4% relative improvement) which can eventually match the performance of SGMMs; (3) SGMMs can be significantly improved when individual systems are combined on a score-level. This suggests that the SGMM systems provide complementary recognition outputs. Overall relative improvements of the combined SGMMand HMM/GMM systems are 21% and 17% respectively compared to a standard ASR baseline.

Keywords :

Gaussian distribution; cepstral analysis; hidden Markov models; neural nets; speech recognition; LVCSR task; ROVER; SGMM; acoustic features; automatic speech recognition; cepstral features; complementary recognition outputs; complex features; feature level combination; feature-level combination; large vocabulary continuous speech recognition; neural network; rich transcription; score level combination; standard ASR baseline; standard HMM-GMM; subspace Gaussian mixture models; word sequences; Abstracts; Hidden Markov models; Mel frequency cepstral coefficient; Automatic Speech Recognition; Discriminative features; System combination;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEE International Conference on

Conference_Location :

Vancouver, BC

ISSN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2013.6639142

Filename :

6639142

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1691526