DocumentCode :
1954926
Title :
Kazakh Noun Phrase Extraction Based on N-gram and Rules
Author :
Altenbek, Gulila ; Sun, Ruina
Author_Institution :
Coll. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
fYear :
2010
fDate :
28-30 Dec. 2010
Firstpage :
305
Lastpage :
308
Abstract :
The aim of the work is to extract Kazakh phrase and basic noun phrase from corpus. For the phrase extraction, N-gram model methods were used, specifically bigram and trigram methods were applied. For basic noun phrase extraction, rule-based methods were used. We started from the grammar structure of basic noun phrase structure model, established a set of rules using the part-of-speech tag and the additional component information of Kazakh basic noun phrase, and extracted the basic noun phrase by rule matching. We have realized the extraction of phrase and basic noun phrase based on corpus of 31 days´ Xinjiang Daily. Experimental results showed that the two methods are feasible, and the extraction accuracies are 50.8% and 79.1% respectively.
Keywords :
feature extraction; grammars; natural languages; Kazakh basic noun phrase; Kazakh noun phrase extraction; N-gram model methods; bigram methods; grammar structure; noun phrase extraction; part-of-speech tag; rule-based methods; trigram methods; Accuracy; Data mining; Grammar; Information processing; Knowledge engineering; Probability; XML; Kazakh language; N-gram model; phrase extraction; phrase structure; rules;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Asian Language Processing (IALP), 2010 International Conference on
Conference_Location :
Harbin
Print_ISBN :
978-1-4244-9063-9
Type :
conf
DOI :
10.1109/IALP.2010.19
Filename :
5681581
Link To Document :
بازگشت