Continuous speech recognition using large vocabulary word spotting and CV syllable spotting

Author

Sugamura, Noboru

Author_Institution

NTT Corp., Kanagawa, Japan

fYear

1990

fDate

3-6 Apr 1990

Firstpage

121

Abstract

A continuous-speech recognition system for Japanese text input is described. The system is speaker dependent and recognizes continuous phrasal speech. A consonant-vowel (CV) concatenation syllable lattice and a word lattice are generated from input speech using a vector-quantization-based continuous DP matching technique. These lattices are converted into Japanese written form using a word dictionary and grammatical information. Two methods for acoustic processing are proposed: a large vocabulary word spotting technique using high-frequency words and an optimization method for adjusting CV templates to a speaker by maximizing recognition accuracy. In recognition experiments, 500 words were selected as word templates, covering more than 90% of the technical words in reports written about X-ray computer tomography scanning. 1000 CV syllables were generated using these words. A word spotting accuracy of more than 70% and a CV spotting accuracy of 72% were obtained after iterative training. The CV spotting accuracy was improved by about 10% through iterative training

Keywords

learning systems; speech analysis and processing; speech recognition; CV syllable spotting; CV templates; Japanese text input; acoustic processing; consonant-vowel concatenation syllable lattice; continuous phrasal speech; continuous-speech recognition system; grammatical information; high-frequency words; iterative training; large vocabulary word spotting; optimization method; speaker dependent; vector-quantization-based continuous DP matching; word dictionary; word lattice; Dictionaries; Humans; Lattices; Loudspeakers; Optimization methods; Speech analysis; Speech processing; Speech recognition; Telegraphy; Telephony; Text recognition; Tomography; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Acoustics, Speech, and Signal Processing, 1990. ICASSP-90., 1990 International Conference on

Conference_Location

Albuquerque, NM

ISSN

1520-6149

Type

conf

DOI

10.1109/ICASSP.1990.115553

Filename

115553