Title :
Speaker-independent dictation of Chinese speech with 32K vocabulary
Author :
Xu, Bo ; Ma, Bing ; Zhang, Shuwu ; Qu, Fei ; Huang, Taiyi
Author_Institution :
Inst. of Autom., Acad. Sinica, Beijing, China
Abstract :
While early machines adopted isolated syllables as input units and needed boring enrollment, our research focus on the speaker independent, word based dictation. A deliberately designed 120 speaker database was built for training; inter syllable context, tonal and endpoint dependent acoustic model are applied with a promising MFCC feature. Two pass acoustic matching accelerates the recognition, taking full advantage of the monosyllabic structure of Chinese speech. A complete word bigram and trigram serve as language processing module. With all efforts, the system reaches 90% character accuracy, performing in almost real time on a Pentium PC without DSP help
Keywords :
database management systems; dictation; microcomputer applications; natural languages; speech recognition; word processing; 120 speaker database; 32K vocabulary; Chinese speech; MFCC feature; Pentium PC; character accuracy; endpoint dependent acoustic model; input units; inter syllable context; isolated syllable; language processing module; monosyllabic structure; speaker independent dictation; speaker independent word based dictation; trigram; two pass acoustic matching; word bigram; Acceleration; Context modeling; Digital signal processing; Loudspeakers; Mel frequency cepstral coefficient; Natural languages; Real time systems; Spatial databases; Speech recognition; Vocabulary;
Conference_Titel :
Spoken Language, 1996. ICSLP 96. Proceedings., Fourth International Conference on
Conference_Location :
Philadelphia, PA
Print_ISBN :
0-7803-3555-4
DOI :
10.1109/ICSLP.1996.607272