Title :
A cross-channel modeling approach for automatic segmentation of conversational telephone speech [automatic speech recognition applications]
Author :
Liu, Daben ; Kubala, Francis
Author_Institution :
BBN Syst. & Technol. Corp., Cambridge, MA, USA
fDate :
30 Nov.-3 Dec. 2003
Abstract :
Automatic segmentation of audio is an essential front-end process for automatic speech recognition applications where true speech boundaries are unknown. In this paper, we present a cross-channel modeling approach for segmentation in a specific domain - 4-wire recorded conversational telephone speech. The paper describes and compares two types of cross-channel modeling - energy-based and Gaussian mixture model. Since improving speech recognition accuracy is our main objective, the effectiveness of automatic segmentation is measured using the word-error-rate (WER) and compared with a manual-segmentation baseline. With cross-channel modeling, we obtained a negligible WER difference between manual and automatic segmentation on three different languages. Issues, such as training data preparation, features, and language-dependency, are also discussed.
Keywords :
Gaussian processes; error statistics; speech processing; speech recognition; Gaussian mixture model; WER; automatic speech recognition accuracy; automatic speech segmentation; cross-channel modeling method; energy-based modeling; language-dependency; manual segmentation; recorded conversational telephone speech; training data preparation; unknown speech boundaries; word-error-rate; Automatic speech recognition; Benchmark testing; Broadcasting; Crosstalk; NIST; Speech processing; Speech recognition; Telephony; Training data; Wire;
Conference_Titel :
Automatic Speech Recognition and Understanding, 2003. ASRU '03. 2003 IEEE Workshop on
Print_ISBN :
0-7803-7980-2
DOI :
10.1109/ASRU.2003.1318463