مرکز منطقه ای اطلاع رساني علوم و فناوري - Silence is golden: Modeling non-speech events in WFST-based dynamic network decoders

DocumentCode :

3162545

Title :

Silence is golden: Modeling non-speech events in WFST-based dynamic network decoders

Author :

Rybach, David ; Schlüter, Ralf ; Ney, Hermann

Author_Institution :

Comput. Sci. Dept., RWTH Aachen Univ., Aachen, Germany

fYear :

2012

fDate :

25-30 March 2012

Firstpage :

4205

Lastpage :

4208

Abstract :

Models for silence are a fundamental part of continuous speech recognition systems. Depending on application requirements, audio data segmentation, and availability of detailed training data annotations, it may be necessary or beneficial to differentiate between other non-speech events, for example breath and background noise. The integration of multiple non-speech models in a WFST-based dynamic network decoder is not straightforward, because these models do not perfectly fit in the transducer framework. This paper describes several options for the transducer construction with multiple non-speech models, shows their considerable different characteristics in memory and runtime efficiency, and analyzes the impact on the recognition performance.

Keywords :

decoding; speech recognition; WFST-based dynamic network decoders; audio data segmentation; nonspeech event model; recognition performance; runtime efficiency; transducer construction; transducer framework; Context; Decoding; Hidden Markov models; Noise; Speech; Speech recognition; Transducers; LVCSR; WFST;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing (ICASSP), 2012 IEEE International Conference on

Conference_Location :

Kyoto

ISSN :

1520-6149

Print_ISBN :

978-1-4673-0045-2

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2012.6288846

Filename :

6288846

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3162545