مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised acoustic and language model training with small amounts of labelled data

DocumentCode :

3529114

Title :

Unsupervised acoustic and language model training with small amounts of labelled data

Author :

Novotney, Scott ; Schwartz, Richard ; Ma, Jeff

Author_Institution :

BBN Technol., Cambridge, MA

fYear :

2009

fDate :

19-24 April 2009

Firstpage :

4297

Lastpage :

4300

Abstract :

We measure the effects of a weak language model, estimated from as little as 100k words of text, on unsupervised acoustic model training and then explore the best method of using word confidences to estimate n-gram counts for unsupervised language model training. Even with 100k words of text and 10 hours of training data, unsupervised acoustic modeling is robust, with 50% of the gain recovered when compared to supervised training. For language model training, multiplying the word confidences together to get a weighted count produces the best reduction in WER by 2% over the baseline language model and 0.5% absolute over using unweighted transcripts. Oracle experiments show that a larger gain is possible, but better confidence estimation techniques are needed to identify correct n-grams.

Keywords :

acoustic signal processing; estimation theory; natural language processing; speech processing; language model training; n-gram count; unsupervised acoustic model training; word confidence estimation; Acoustic measurements; Decoding; Labeling; Natural languages; Robustness; Speech recognition; Telephony; Terminology; Training data; Vocabulary; Conversational Telephone Speech; Language Modeling; Unsupervised Training; Word Confidence;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2009. ICASSP 2009. IEEE International Conference on

Conference_Location :

Taipei

ISSN :

1520-6149

Print_ISBN :

978-1-4244-2353-8

Electronic_ISBN :

1520-6149

Type :

conf

DOI :

10.1109/ICASSP.2009.4960579

Filename :

4960579

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=3529114