مرکز منطقه ای اطلاع رساني علوم و فناوري - Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

DocumentCode :

1051883

Title :

Combining Spectral Representations for Large-Vocabulary Continuous Speech Recognition

Author :

Garau, Giulia ; Renals, Steve

Author_Institution :

Univ. of Edinburgh, Edinburgh

Volume :

Issue :

fYear :

2008

fDate :

3/1/2008 12:00:00 AM

Firstpage :

508

Lastpage :

518

Abstract :

In this paper, we investigate the combination of complementary acoustic feature streams in large-vocabulary continuous speech recognition (LVCSR). We have explored the use of acoustic features obtained using a pitch-synchronous analysis, Straight, in combination with conventional features such as Mel frequency cepstral coefficients. Pitch-synchronous acoustic features are of particular interest when used with vocal tract length normalization (VTLN) which is known to be affected by the fundamental frequency. We have combined these spectral representations directly at the acoustic feature level using heteroscedastic linear discriminant analysis (HLDA) and at the system level using ROVER. We evaluated this approach on three LVCSR tasks: dictated newspaper text (WSJCAM0), conversational telephone speech (CTS), and multiparty meeting transcription. The CTS and meeting transcription experiments were both evaluated using standard NIST test sets and evaluation protocols. Our results indicate that combining conventional and pitch-synchronous acoustic feature sets using HLDA results in a consistent, significant decrease in word error rate across all three tasks. Combining at the system level using ROVER resulted in a further significant decrease in word error rate.

Keywords :

speech recognition; text analysis; Mel frequency cepstral coefficients; complementary acoustic feature streams; conversational telephone speech; dictated newspaper text; heteroscedastic linear discriminant analysis; large-vocabulary continuous speech recognition; multiparty meeting transcription; pitch-synchronous acoustic feature sets; pitch-synchronous analysis; spectral representations; vocal tract length normalization; word error rate; Feature combination; ROVER; STRAIGHT; heteroscedastic linear discriminant analysis (HLDA); large-vocabulary continuous speech recognition (LVCSR); pitch-synchronous; vocal tract length normalization (VTLN);

fLanguage :

English

Journal_Title :

Audio, Speech, and Language Processing, IEEE Transactions on

Publisher :

ieee

ISSN :

1558-7916

Type :

jour

DOI :

10.1109/TASL.2008.916519

Filename :

4443886

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=1051883