DocumentCode
294646
Title
Duration modeling in large vocabulary speech recognition
Author
Anastasakos, Anastasios ; Schwartz, Richard ; Shu, Han
Author_Institution
Northeastern Univ., Boston, MA, USA
Volume
1
fYear
1995
fDate
9-12 May 1995
Firstpage
628
Abstract
This paper presents a study of different methods for phoneme duration modeling in large vocabulary speech recognition. We investigate the employment of phoneme duration and the effect of context, speaking rate and lexical stress in the duration of phoneme segments in a large vocabulary speech recognition system. The duration models are used in a postprocessing phase of BYBLOS, our baseline HMM-based recognition system, to rescore the N-Best hypotheses. We describe experiments with the 5 K word ARPA Wall Street Journal (WSJ) corpus. The results show that integration of duration models that take into account context and speaking rate can improve the word accuracy of the baseline recognition system
Keywords
hidden Markov models; speech recognition; ARPA Wall Street Journal corpus; BYBLOS; N-Best hypotheses; baseline HMM-based recognition system; context; duration modeling; large vocabulary speech recognition; lexical stress; phoneme; postprocessing phase; speaking rate; word accuracy; Context modeling; Costs; Density functional theory; Employment; Hidden Markov models; Signal analysis; Speech analysis; Speech recognition; Stress; Vocabulary;
fLanguage
English
Publisher
ieee
Conference_Titel
Acoustics, Speech, and Signal Processing, 1995. ICASSP-95., 1995 International Conference on
Conference_Location
Detroit, MI
ISSN
1520-6149
Print_ISBN
0-7803-2431-5
Type
conf
DOI
10.1109/ICASSP.1995.479676
Filename
479676
Link To Document