Title :
Language processing for Chinese speech recognition
Author :
Huang, Tingwen ; Jiang, Yizhang
Author_Institution :
Inst. of Autom., Acad. Sinica, Beijing
Abstract :
A language processing method based on a statistical model is proposed and studied. This method is different from conventional bigram model, all the words in the vocabulary are mapped into several equivalence classes, according to the collocation between adjacent words. This modified bigram approach retains the simplicity and effectiveness of the bigram model, and also has the advantage of reducing the requirement of memory size to make this approach realizable on a PC computer. It also can moderate the problem of zero probabilities. All the parameters of the model are estimated from a corpus of 1.6 million words, covering a lexicon of 30,000 words. The training of the modified model is automatically realized with an unsupervised learning procedure. Several tests for decoding Chinese syllable strings to text have been carried out. The test results show that the average words correct rate is 87%. For news reports, a high word correct rate 96% is reached based on this modified bigram model
Keywords :
linguistics; natural languages; speech recognition; statistical analysis; unsupervised learning; Chinese speech recognition; Chinese syllable strings; adjacent word; average words correct rate; collocation; decoding; equivalence classes; language processing method; lexicon; memory size; modified bigram approach; statistical model; training; unsupervised learning; vocabulary; zero probabilities; Acoustic testing; Business communication; Character recognition; Materials testing; Natural languages; Parameter estimation; Phase estimation; Probability; Speech recognition; Vocabulary;
Conference_Titel :
Speech, Image Processing and Neural Networks, 1994. Proceedings, ISSIPNN '94., 1994 International Symposium on
Print_ISBN :
0-7803-1865-X
DOI :
10.1109/SIPNN.1994.344944