Title :
Improved Speech Recognition using Acoustic and Lexical Correlates of Pitch Accent in a N-Best Rescoring Framework
Author :
Ananthakrishnan, Sankaranarayanan ; Narayanan, Shrikanth
Author_Institution :
Dept. of Electr. Eng., Univ. of Southern California, Los Angeles, CA
Abstract :
Most statistical speech recognition systems make use of segment-level features, derived mainly from spectral envelope characteristics of the signal, but ignore supra-segmental cues that carry additional information likely to be useful for speech recognition. These cues, which constitute the prosody of the utterance and occur at the syllable, word and utterance level, are closely related to the lexical and syntactic organization of the utterance. In this paper, we explore the use of acoustic and lexical correlates of a subset of these cues in order to improve recognition performance on a read-speech corpus, using word error rate (WER) as the metric. Using the features and methods described in this paper, we were able to obtain a relative WER improvement of 1.3% over a baseline ASR system on the Boston University Radio News Corpus.
Keywords :
speech recognition; Boston University; Radio News Corpus; acoustic correlates; lexical correlates; n-best rescoring framework; pitch accent; read-speech corpus; segment-level features; spectral envelope; speech recognition; word error rate; Acoustical engineering; Automatic speech recognition; Data mining; Error analysis; Feature extraction; Laboratories; Natural languages; Speech analysis; Speech recognition; Viterbi algorithm; prosody; re-ranking N-best lists; speech recognition;
Conference_Titel :
Acoustics, Speech and Signal Processing, 2007. ICASSP 2007. IEEE International Conference on
Conference_Location :
Honolulu, HI
Print_ISBN :
1-4244-0727-3
Electronic_ISBN :
1520-6149
DOI :
10.1109/ICASSP.2007.367209