DocumentCode
2729232
Title
Tibetan word segmentation system based on conditional random fields
Author
Jiang, Tao ; Yu, Hongzhi ; Jam, Yangkyi
Author_Institution
Key Lab. of China´´s Nat. Linguistic Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
fYear
2011
fDate
15-17 July 2011
Firstpage
446
Lastpage
448
Abstract
Unlike English and other western languages, there are no delimiters to mark word boundaries in both Chinese and Tibetan. Therefore, word segmentation is the first step for Chinese and Tibetan natural language processing such as machine translation and information retrieval. However, Chinese word segmentation has been studied for many years and the technology is relatively mature. In contrast, Tibetan word segmentation is less concerned by researchers. In this paper, we learn from Chinese word segmentation approach and analysis the characteristic of Tibetan language, designs a Tibetan word segmentation system based on conditional random fields. The experiment shows that the algorithm is effective and can be preliminary applied.
Keywords
image segmentation; natural language processing; random processes; word processing; Chinese word segmentation; Tibetan natural language processing; Tibetan word segmentation; conditional random fields; information retrieval; machine translation; Dictionaries; Feature extraction; Hidden Markov models; Laboratories; Markov processes; Natural language processing; Tagging; Natual language processing; Tibetan word segmentation; conditional random fields;
fLanguage
English
Publisher
ieee
Conference_Titel
Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on
Conference_Location
Beijing
Print_ISBN
978-1-4244-9699-0
Type
conf
DOI
10.1109/ICSESS.2011.5982349
Filename
5982349
Link To Document