Title :
Chinese word classification based on statistics
Author :
Shi-wan, Zhao ; Ying, Xia ; Shao-ping, Ma ; Yu, Wang ; Zhong, Su
Author_Institution :
Dept. of Comput. Sci. & Technol., Tsinghua Univ., Beijing, China
Abstract :
Chinese words classification based on statistics plays an important role in natural language processing, such as speech recognition, intelligent Chinese input method, and so on. We first do statistics and calculation work on the large-scale corpus text, and then use the average mutual information as the global cost function for clustering all Chinese words into a predefined number of classes with a hybrid top-down splitting and bottom-up merging approach. The result of classification is encouraging and can be used in the class-based language model
Keywords :
character recognition; natural languages; speech recognition; statistics; Chinese word classification; average mutual information; bottom-up merging approach; class-based language model; global cost function; hybrid top-down splitting; intelligent Chinese input method; large-scale corpus text; natural language processing; Computer science; Cost function; Intelligent systems; Laboratories; Merging; Mutual information; Natural language processing; Natural languages; Speech recognition; Statistics;
Conference_Titel :
Intelligent Control and Automation, 2000. Proceedings of the 3rd World Congress on
Conference_Location :
Hefei
Print_ISBN :
0-7803-5995-X
DOI :
10.1109/WCICA.2000.862560