Automatic Word Segmentation for Chinese Classics of Tea Based on Tree-Pruning

Author

Fang, Miao ; Jiang, Yi ; Zhao, Qi ; Jiang, Xin

Author_Institution

Northeastern Univ. at Qinhuangdao, Qinhuangdao, China

Volume

1

fYear

2009

fDate

Nov. 30 2009-Dec. 1 2009

Firstpage

438

Lastpage

441

Abstract

Automatic word-segmentation is vital for the reading, comprehension and translation of classics. However, large amount of special terms, allusions and proper names within the classics make it difficult for word segmentation. Taking classics of tea as the subject of research, a method was proposed using likelihood ratio statistics to decide two-character words candidate, three character words candidates and multi-character words candidates, and then segment classics of tea automatically by tree-pruning algorithm. The computation complexity of the tree-pruning algorithm is O (LN), L is number of the Chinese characters of the longest word. Experiments show it has better results in word-segmentation.

Keywords

computational complexity; trees (mathematics); word processing; Chinese classics; automatic word segmentation; computation complexity; likelihood ratio statistics; tree-pruning algorithm; Dictionaries; Frequency; Gaussian distribution; History; Knowledge acquisition; Statistical distributions; Statistics; classics of tea; segmentation; tree-pruning;

fLanguage

English

Publisher

ieee

Conference_Titel

Knowledge Acquisition and Modeling, 2009. KAM '09. Second International Symposium on

Conference_Location

Wuhan

Print_ISBN

978-0-7695-3888-4

Type

conf

DOI

10.1109/KAM.2009.80

Filename

5362115