Tibetan word segmentation system based on conditional random fields

Author

Jiang, Tao ; Yu, Hongzhi ; Jam, Yangkyi

Author_Institution

Key Lab. of China´´s Nat. Linguistic Inf. Technol., Northwest Univ. for Nat., Lanzhou, China

fYear

2011

fDate

15-17 July 2011

Firstpage

446

Lastpage

448

Abstract

Unlike English and other western languages, there are no delimiters to mark word boundaries in both Chinese and Tibetan. Therefore, word segmentation is the first step for Chinese and Tibetan natural language processing such as machine translation and information retrieval. However, Chinese word segmentation has been studied for many years and the technology is relatively mature. In contrast, Tibetan word segmentation is less concerned by researchers. In this paper, we learn from Chinese word segmentation approach and analysis the characteristic of Tibetan language, designs a Tibetan word segmentation system based on conditional random fields. The experiment shows that the algorithm is effective and can be preliminary applied.

Keywords

image segmentation; natural language processing; random processes; word processing; Chinese word segmentation; Tibetan natural language processing; Tibetan word segmentation; conditional random fields; information retrieval; machine translation; Dictionaries; Feature extraction; Hidden Markov models; Laboratories; Markov processes; Natural language processing; Tagging; Natual language processing; Tibetan word segmentation; conditional random fields;

fLanguage

English

Publisher

ieee

Conference_Titel

Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on

Conference_Location

Beijing

Print_ISBN

978-1-4244-9699-0

Type

conf

DOI

10.1109/ICSESS.2011.5982349

Filename

5982349