Title :
Research on recognition of semantic chunk boundary in Tibetan
Author :
Tianhang Wang ; Shumin Shi ; Heyan Huang ; Congjun Long ; Ruijing Li
Author_Institution :
Sch. of Comput. Sci. & Technol., Beijing Inst. of Technol., Beijing, China
Abstract :
Semantic chunk is able to well describe the sentence semantic framework. It plays a very important role in Natural Language Processing applications, such as machine translation, QA system and so on. At present, the Tibetan chunk researches are mainly based on rule-methods. In this paper, according to the distinctive language characteristics of Tibetan, we firstly put forward the descriptive definition of the Tibetan semantic chunk and its labeling scheme and then we propose a feature selection algorithm to select the suitable ones automatically from the candidate feature-templates. Through the experiment conducted on the two different kinds of Tibetan corpus, namely corpus-sentence and corpus-discourse, the F-Measure achieves 95.84%, 94.95% and 91.97%, 88.82% by using of Conditional Random Fields (CRF) model and Maximum Entropy (ME) model respectively. The positive results show that the definition of Tibetan semantic chunk in this paper is reasonable and operable. Furthermore, its boundary recognition is feasible and effective via statistical techniques in small scale corpus.
Keywords :
feature selection; maximum entropy methods; natural language processing; statistical analysis; word processing; CRF model; F-measure; ME model; Tibetan corpus; Tibetan semantic chunk; conditional random fields; corpus-discourse; corpus-sentence; feature selection algorithm; feature-templates; labeling scheme; maximum entropy; semantic chunk boundary recognition; sentence semantic framework; Computer science; Educational institutions; Entropy; Information processing; Labeling; Semantics; Training; CRF; ME; Tibetan semantic chunk; chunk boundary recognition;
Conference_Titel :
Asian Language Processing (IALP), 2014 International Conference on
Conference_Location :
Kuching
DOI :
10.1109/IALP.2014.6973476