Title :
Tibetan word segmentation system based on conditional random fields
Author :
Jiang, Tao ; Yu, Hongzhi ; Jam, Yangkyi
Author_Institution :
Key Lab. of China´´s Nat. Linguistic Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
Abstract :
Unlike English and other western languages, there are no delimiters to mark word boundaries in both Chinese and Tibetan. Therefore, word segmentation is the first step for Chinese and Tibetan natural language processing such as machine translation and information retrieval. However, Chinese word segmentation has been studied for many years and the technology is relatively mature. In contrast, Tibetan word segmentation is less concerned by researchers. In this paper, we learn from Chinese word segmentation approach and analysis the characteristic of Tibetan language, designs a Tibetan word segmentation system based on conditional random fields. The experiment shows that the algorithm is effective and can be preliminary applied.
Keywords :
image segmentation; natural language processing; random processes; word processing; Chinese word segmentation; Tibetan natural language processing; Tibetan word segmentation; conditional random fields; information retrieval; machine translation; Dictionaries; Feature extraction; Hidden Markov models; Laboratories; Markov processes; Natural language processing; Tagging; Natual language processing; Tibetan word segmentation; conditional random fields;
Conference_Titel :
Software Engineering and Service Science (ICSESS), 2011 IEEE 2nd International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-9699-0
DOI :
10.1109/ICSESS.2011.5982349