Language processing for Chinese speech recognition

Author

Huang, Tingwen ; Jiang, Yizhang

Author_Institution

Inst. of Autom., Acad. Sinica, Beijing

fYear

1994

fDate

13-16 Apr 1994

Firstpage

151

Abstract

A language processing method based on a statistical model is proposed and studied. This method is different from conventional bigram model, all the words in the vocabulary are mapped into several equivalence classes, according to the collocation between adjacent words. This modified bigram approach retains the simplicity and effectiveness of the bigram model, and also has the advantage of reducing the requirement of memory size to make this approach realizable on a PC computer. It also can moderate the problem of zero probabilities. All the parameters of the model are estimated from a corpus of 1.6 million words, covering a lexicon of 30,000 words. The training of the modified model is automatically realized with an unsupervised learning procedure. Several tests for decoding Chinese syllable strings to text have been carried out. The test results show that the average words correct rate is 87%. For news reports, a high word correct rate 96% is reached based on this modified bigram model

Keywords

linguistics; natural languages; speech recognition; statistical analysis; unsupervised learning; Chinese speech recognition; Chinese syllable strings; adjacent word; average words correct rate; collocation; decoding; equivalence classes; language processing method; lexicon; memory size; modified bigram approach; statistical model; training; unsupervised learning; vocabulary; zero probabilities; Acoustic testing; Business communication; Character recognition; Materials testing; Natural languages; Parameter estimation; Phase estimation; Probability; Speech recognition; Vocabulary;

fLanguage

English

Publisher

ieee

Conference_Titel

Speech, Image Processing and Neural Networks, 1994. Proceedings, ISSIPNN '94., 1994 International Symposium on

Print_ISBN

0-7803-1865-X

Type

conf

DOI

10.1109/SIPNN.1994.344944

Filename

344944