Title :
Chinese word segmentation based on A-priori and adjacent characters
Author :
Wang, Ye ; Huang, Shang-Teng
Author_Institution :
Dept. of Comput. Sci. & Eng., Shanghai Jiao Tong Univ., China
Abstract :
Chinese word segmentation is an important and difficult problem, due to the special written format of Chinese. In this paper, an adjacent characters and A-priori based algorithm is presented for segmentation. In this new method, the information of adjacent characters is utilized to join the n-grams and their adjacent characters. Experimental results show that the performance of the new method is remarkably better than the mutual information based methods when LDC95T13 Chinese collection is tested.
Keywords :
natural languages; word processing; A-priori based algorithm; Chinese word segmentation; adjacent characters algorithm; Computer science; Cybernetics; Dictionaries; Gallium nitride; Machine learning; Mutual information; Natural languages; Statistical analysis; Sun; Testing; A-priori; Word segmentation; adjacent characters; n-grams;
Conference_Titel :
Machine Learning and Cybernetics, 2005. Proceedings of 2005 International Conference on
Conference_Location :
Guangzhou, China
Print_ISBN :
0-7803-9091-1
DOI :
10.1109/ICMLC.2005.1527603