مرکز منطقه ای اطلاع رساني علوم و فناوري - Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus

DocumentCode :

454709

Title :

Unsupervised Adaptation of a Stochastic Language Model Using a Japanese Raw Corpus

Author :

Kurata, Gakuto ; Mori, Shinsuke ; Nishimura, Masafumi

Author_Institution :

IBM Res., IBM Japan Ltd., Kanagawa

Volume :

fYear :

2006

fDate :

14-19 May 2006

Abstract :

The target uses of large vocabulary continuous speech recognition (LVCSR) systems are spreading. It takes a lot of time to build a good LVCSR system specialized for the target domain because experts need to manually segment the corpus of the target domain, which is a labor-intensive task. In this paper, we propose a new method to adapt an LVCSR system to a new domain. In our method, we stochastically segment a Japanese raw corpus of the target domain. Then a domain-specific language model (LM) is built based on this corpus. All of the domain-specific words can be added to the lexicon for LVCSR. Most importantly, the proposed method is fully automatic. Therefore, we can reduce the time for introducing an LVCSR system drastically. In addition, the proposed method yielded a comparable or even superior performance to use of expensive manual segmentation

Keywords :

natural languages; speech processing; speech recognition; stochastic processes; Japanese raw corpus; domain-specific words; large vocabulary continuous speech recognition; stochastic language model; unsupervised adaptation; Automatic speech recognition; Degradation; Domain specific languages; Laboratories; Magnetooptic recording; Natural languages; Speech recognition; Stochastic processes; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Acoustics, Speech and Signal Processing, 2006. ICASSP 2006 Proceedings. 2006 IEEE International Conference on

Conference_Location :

Toulouse

ISSN :

1520-6149

Print_ISBN :

1-4244-0469-X

Type :

conf

DOI :

10.1109/ICASSP.2006.1660201

Filename :

1660201

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=454709