DocumentCode :
2242380
Title :
Japanese large-vocabulary continuous-speech recognition using a business-newspaper corpus
Author :
Matsuoka, Tatsuo ; Ohtsuki, Katsutoshi ; Mori, Takeshi ; Fur, Sadaoki ; Shirai, Katsuhiko
Author_Institution :
NTT Human Interface Labs., Tokyo, Japan
Volume :
1
fYear :
1996
fDate :
3-6 Oct 1996
Firstpage :
22
Abstract :
Studies Japanese large-vocabulary continuous-speech recognition (LV CSR) for a Japanese business newspaper. To enable word N-grams to be used, sentences were first segmented into words (morphemes) using a morphological analyzer. About five years of newspaper articles were used to train N-gram language models. To evaluate our recognition system, we recorded speech data for sentences from another set of articles. Using the speech corpus, LV CSR experiments were conducted. For a 7k vocabulary, the word error rate was 82.8% when no grammar and context-independent acoustic models were used. This improved to 20.0% when both bigram language models and context-dependent acoustic models were used
Keywords :
acoustics; grammars; natural languages; nomograms; speech recognition; vocabulary; Japanese business newspaper corpus; Japanese large-vocabulary continuous-speech recognition; N-gram language model training; bigram language models; context-dependent acoustic models; context-independent acoustic models; grammar; morphemes; morphological analyzer; newspaper articles; sentence segmentation; speech data recording; word N-grams; word error rate; Context modeling; Dictionaries; Frequency; Humans; Laboratories; Natural languages; Speech analysis; Speech recognition; Testing; Vocabulary;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
Type :
conf
DOI :
10.1109/ICSLP.1996.607005
Filename :
607005
Link To Document :
بازگشت