مرکز منطقه ای اطلاع رساني علوم و فناوري - Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus

DocumentCode :

2242380

Title :

Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus

Author :

Matsuoka, Tatsuo ; Ohtsuki, Katsutoshi ; Mori, Takeshi ; Fur, Sadaoki ; Shirai, Katsuhiko

Author_Institution :

NTT Human Interface Labs., Tokyo, Japan

Volume :

fYear :

1996

fDate :

3-6 Oct 1996

Firstpage :

Abstract :

Studies Japanese large-vocabulary continuous-speech recognition (LV CSR) for a Japanese business newspaper. To enable word N-grams to be used, sentences were first segmented into words (morphemes) using a morphological analyzer. About five years of newspaper articles were used to train N-gram language models. To evaluate our recognition system, we recorded speech data for sentences from another set of articles. Using the speech corpus, LV CSR experiments were conducted. For a 7k vocabulary, the word error rate was 82.8% when no grammar and context-independent acoustic models were used. This improved to 20.0% when both bigram language models and context-dependent acoustic models were used

Keywords :

acoustics; grammars; natural languages; nomograms; speech recognition; vocabulary; Japanese business newspaper corpus; Japanese large-vocabulary continuous-speech recognition; N-gram language model training; bigram language models; context-dependent acoustic models; context-independent acoustic models; grammar; morphemes; morphological analyzer; newspaper articles; sentence segmentation; speech data recording; word N-grams; word error rate; Context modeling; Dictionaries; Frequency; Humans; Laboratories; Natural languages; Speech analysis; Speech recognition; Testing; Vocabulary;

fLanguage :

English

Publisher :

ieee

Conference_Titel :

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location :

Philadelphia, PA

Print_ISBN :

0-7803-3555-4

Type :

conf

DOI :

10.1109/ICSLP.1996.607005

Filename :

607005

Link To Document :

https://search.ricest.ac.ir/dl/search/defaultta.aspx?DTC=49&DC=2242380