مرکز منطقه ای اطلاع رساني علوم و فناوري - On the use of lattices for the automatic generation of pronunciations

DocumentCode :

3442525

Title :

On the use of lattices for the automatic generation of pronunciations

Author :

Deligne, Subine ; Mangu, Lidia

Author_Institution :

IBM T. J. Watson Res. Center, Yorktown Heights, NY, USA

Volume :

fYear :

2003

fDate :

6-10 April 2003

Abstract :

In this paper, we explore the use of lattices to generate pronunciations for speech recognition based on the observation of a few (say one or two) speech utterances of a word. Various search strategies are investigated in combination with schemes where single or multiple pronunciations are generated for each speech utterance. In our experiments, a strategy that combines merging time-overlapping links in a context-dependent subphone lattice and generating multiple pronunciations provides the best recognition accuracy. This results in average relative gains of 30% over the generation of single pronunciations using a Viterbi search.

Keywords :

maximum likelihood estimation; speech recognition; Viterbi search; automatic generation; average relative gains; context-dependent subphone lattice; lattices; multiple pronunciations; recognition accuracy; search strategies; speech recognition; speech utterances; time-overlapping links; Cepstral analysis; Context modeling; Decision trees; Gaussian distribution; Gaussian processes; Lattices; Merging; Speech recognition; Viterbi algorithm; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03). 2003 IEEE International Conference on

ISSN :

1520-6149

Print_ISBN :

0-7803-7663-3

Type :

conf

DOI :

10.1109/ICASSP.2003.1198752

Filename :

1198752

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3442525