مرکز منطقه ای اطلاع رساني علوم و فناوري - Incorporating information from syllable-length time scales into automatic speech recognition

DocumentCode :

3433791

Title :

Incorporating information from syllable-length time scales into automatic speech recognition

Author :

Wu, Su-Lin ; Kingsbury, Brian E D ; Morgan, Nelson ; Greenberg, Steven

Author_Institution :

Int. Comput. Sci. Inst., Berkeley, CA, USA

Volume :

fYear :

1998

fDate :

12-15 May 1998

Firstpage :

721

Abstract :

Including information distributed over intervals of syllabic duration (100-250 ms) may greatly improve the performance of automatic speech recognition (ASR) systems. ASR systems primarily use representations and recognition units covering phonetic durations (40-100 ms). Humans certainly use information at phonetic time scales, but results from psychoacoustics and psycholinguistics highlight the crucial role of the syllable, and syllable-length intervals, in speech perception. We compare the performance of three ASR systems: a baseline system that uses phone-scale representations and units, an experimental system that uses a syllable-oriented front-end representation and syllabic units for recognition, and a third system that combines the phone-scale and syllable-scale recognizers by merging and rescoring N-best lists. Using the combined recognition system, we observed an improvement in word error rate for telephone-bandwidth, continuous numbers from 6.8% to 5.5% on a clean test set, and from 27.8% to 19.6% on a reverberant test set, over the baseline phone-based system

Keywords :

decoding; error statistics; feature extraction; pattern classification; signal representation; speech intelligibility; speech processing; speech recognition; 100 to 250 ms; 40 to 100 ms; ASR systems; N-best lists; automatic speech recognition; baseline phone-based system; clean test set; combined recognition system; continuous numbers; experimental system; feature extraction; performance; phone-scale representations; phonetic time scales; psychoacoustics; psycholinguistics; recognition units; reverberant test set; speech decoding; speech intelligibility; speech perception; speech unit classification; syllabic duration; syllable-length time scales; syllable-oriented front-end representation; syllable-scale recognizers; telephone-bandwidth; word error rate; Automatic speech recognition; Computer science; Error analysis; Humans; Merging; Psychoacoustics; Psychology; Speech processing; Speech recognition; System testing;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 1998. Proceedings of the 1998 IEEE International Conference on

Conference_Location :

Seattle, WA

ISSN :

1520-6149

Print_ISBN :

0-7803-4428-6

Type :

conf

DOI :

10.1109/ICASSP.1998.675366

Filename :

675366

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3433791