DocumentCode :
2729232
Title :
Tibetan word segmentation system based on conditional random fields
Author :
Jiang, Tao ; Yu, Hongzhi ; Jam, Yangkyi
Author_Institution :
Key Lab. of China´´s Nat. Linguistic Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
fYear :
2011
fDate :
15-17 July 2011
Firstpage :
446
Lastpage :
448
Abstract :
Unlike English and other western languages, there are no delimiters to mark word boundaries in both Chinese and Tibetan. Therefore, word segmentation is the first step for Chinese and Tibetan natural language processing such as machine translation and information retrieval. However, Chinese word segmentation has been studied for many years and the technology is relatively mature. In contrast, Tibetan word segmentation is less concerned by researchers. In this paper, we learn from Chinese word segmentation approach and analysis the characteristic of Tibetan language, designs a Tibetan word segmentation system based on conditional random fields. The experiment shows that the algorithm is effective and can be preliminary applied.
Keywords :
image segmentation; natural language processing; random processes; word processing; Chinese word segmentation; Tibetan natural language processing; Tibetan word segmentation; conditional random fields; information retrieval; machine translation; Dictionaries; Feature extraction; Hidden Markov models; Laboratories; Markov processes; Natural language processing; Tagging; Natual language processing; Tibetan word segmentation; conditional random fields;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-9699-0
Type :
conf
DOI :
10.1109/ICSESS.2011.5982349
Filename :
5982349
Link To Document :
بازگشت