Title :
Towards automatic corpus preparation for a German broadcast news transcription system
Author :
Macherey, Wolfgang ; Ney, Hermann
Author_Institution :
Lehrstuhl für Informatik VI, Computer Science Department, RWTH Aachen - University of Technology, 52056, Germany
Abstract :
When setting up a speech recognition system for a new domain, a lot of manual effort is spent on corpus preparation, i.e., data acquisition, cutting and segmentation of the audio material, generation of pronunciation lexica, as well as the definition of suitable training and test sets. In this paper we describe several methods that help to automate and thus to speed up this procedure. For this purpose, we assume that only a preliminary, partially incorrect textual transcription is available. The effectivity of the proposed methods is demonstrated with the development of a transcription system for the recognition of German broadcast news.
Keywords :
Adaptation model; Biomedical monitoring; Irrigation; Markov processes; Optimization; Speech; Temperature sensors;
Conference_Titel :
Acoustics, Speech, and Signal Processing (ICASSP), 2002 IEEE International Conference on
Conference_Location :
Orlando, FL, USA
Print_ISBN :
0-7803-7402-9
DOI :
10.1109/ICASSP.2002.5743822