Title :
Automatic Thai audio/transcription segmentation
Author :
Santiklang, Chatree ; Wutiwiwatchai, Chai ; Boonpramuk, Panuthat
Author_Institution :
Control Syst. & Instrum. Eng., King Mongkut Univ. of Technol. Thonburi, Bangkok
Abstract :
This paper proposes an automatic algorithm for segmenting audio and transcription streams to be used in constructing a large vocabulary continuous speech recognition (LVCSR) system. In many cases, LVCSR training data are derived from audio materials with available transcriptions such as preach and news articles. In Thai, these resources are usually in the form of long wave files with their corresponding text articles written with no explicit word nor sentence separation, which is crucial for acoustic and language model training in LVCSR. The proposed algorithm segments a large wave file into small utterances using energy detection. The transcription is then aligned to each utterance using dynamic time warping (DTW) combined with a classification and regression tree (CART) confidence measure over a phone basis. An evaluation shows that the DTW alignment procedure still requires an improvement while the CART confidence measure achieves a promising result.
Keywords :
regression analysis; speech recognition; trees (mathematics); CART confidence measure; LVCSR training data; automatic Thai audio segmentation; dynamic time warping; energy detection; language model training; large vocabulary continuous speech recognition system; long wave files; regression tree; sentence separation; transcription segmentation; Acoustic measurements; Acoustic signal detection; Acoustic waves; Classification tree analysis; Regression tree analysis; Speech recognition; Streaming media; Time measurement; Training data; Vocabulary;
Conference_Titel :
Electrical Engineering/Electronics, Computer, Telecommunications and Information Technology, 2009. ECTI-CON 2009. 6th International Conference on
Conference_Location :
Pattaya, Chonburi
Print_ISBN :
978-1-4244-3387-2
Electronic_ISBN :
978-1-4244-3388-9
DOI :
10.1109/ECTICON.2009.5137218