Speaker-independent dictation of Chinese speech with 32K vocabulary

Author

Xu, Bo ; Ma, Bing ; Zhang, Shuwu ; Qu, Fei ; Huang, Taiyi

Author_Institution

Inst. of Autom., Acad. Sinica, Beijing, China

Volume

4

fYear

1996

fDate

3-6 Oct 1996

Firstpage

2320

Abstract

While early machines adopted isolated syllables as input units and needed boring enrollment, our research focus on the speaker independent, word based dictation. A deliberately designed 120 speaker database was built for training; inter syllable context, tonal and endpoint dependent acoustic model are applied with a promising MFCC feature. Two pass acoustic matching accelerates the recognition, taking full advantage of the monosyllabic structure of Chinese speech. A complete word bigram and trigram serve as language processing module. With all efforts, the system reaches 90% character accuracy, performing in almost real time on a Pentium PC without DSP help

Keywords

database management systems; dictation; microcomputer applications; natural languages; speech recognition; word processing; 120 speaker database; 32K vocabulary; Chinese speech; MFCC feature; Pentium PC; character accuracy; endpoint dependent acoustic model; input units; inter syllable context; isolated syllable; language processing module; monosyllabic structure; speaker independent dictation; speaker independent word based dictation; trigram; two pass acoustic matching; word bigram; Acceleration; Context modeling; Digital signal processing; Loudspeakers; Mel frequency cepstral coefficient; Natural languages; Real time systems; Spatial databases; Speech recognition; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on

Conference_Location

Philadelphia, PA

Print_ISBN

0-7803-3555-4

Type

conf

DOI

10.1109/ICSLP.1996.607272

Filename

607272