DocumentCode :
3433791
Title :
Incorporating information from syllable-length time scales into automatic speech recognition
Author :
Wu, Su-Lin ; Kingsbury, Brian E D ; Morgan, Nelson ; Greenberg, Steven
Author_Institution :
Int. Comput. Sci. Inst., Berkeley, CA, USA
Volume :
2
fYear :
1998
fDate :
12-15 May 1998
Firstpage :
721
Abstract :
Including information distributed over intervals of syllabic duration (100-250 ms) may greatly improve the performance of automatic speech recognition (ASR) systems. ASR systems primarily use representations and recognition units covering phonetic durations (40-100 ms). Humans certainly use information at phonetic time scales, but results from psychoacoustics and psycholinguistics highlight the crucial role of the syllable, and syllable-length intervals, in speech perception. We compare the performance of three ASR systems: a baseline system that uses phone-scale representations and units, an experimental system that uses a syllable-oriented front-end representation and syllabic units for recognition, and a third system that combines the phone-scale and syllable-scale recognizers by merging and rescoring N-best lists. Using the combined recognition system, we observed an improvement in word error rate for telephone-bandwidth, continuous numbers from 6.8% to 5.5% on a clean test set, and from 27.8% to 19.6% on a reverberant test set, over the baseline phone-based system
Keywords :
decoding; error statistics; feature extraction; pattern classification; signal representation; speech intelligibility; speech processing; speech recognition; 100 to 250 ms; 40 to 100 ms; ASR systems; N-best lists; automatic speech recognition; baseline phone-based system; clean test set; combined recognition system; continuous numbers; experimental system; feature extraction; performance; phone-scale representations; phonetic time scales; psychoacoustics; psycholinguistics; recognition units; reverberant test set; speech decoding; speech intelligibility; speech perception; speech unit classification; syllabic duration; syllable-length time scales; syllable-oriented front-end representation; syllable-scale recognizers; telephone-bandwidth; word error rate; Automatic speech recognition; Computer science; Error analysis; Humans; Merging; Psychoacoustics; Psychology; Speech processing; Speech recognition; System testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on
Conference_Location :
Seattle, WA
ISSN :
1520-6149
Print_ISBN :
0-7803-4428-6
Type :
conf
DOI :
10.1109/ICASSP.1998.675366
Filename :
675366
Link To Document :
بازگشت