DocumentCode :
3315203
Title :
Conditional Random Fields combined FSM stemming method for Uyghur
Author :
Wumaier, Aishan ; Yibulayin, Tuergen ; Kadeer, Zaokere ; Tian, Shengwei
Author_Institution :
Sch. of Inf. Sci. & Eng., Xinjiang Univ., Urumqi, China
fYear :
2009
fDate :
8-11 Aug. 2009
Firstpage :
295
Lastpage :
299
Abstract :
This paper presents the generation of Uyghur noun suffix DFA combined with conditional random fields (CRF) for stemming algorithm. Because of the agglutinative nature of Uyghur language, stemming is an essential task for Uyghur language processing applications. We generate Uyghur noun inflectional suffixes finite state machines (FSMs) by using the morphotactic rules in reverse order. But there are eight suffixes which is similar to the ending part of some words. These suffixes make the FSM ambiguous. We apply the CRF model to resolve ambiguity of the FSM. This paper describes the steps of generating the FSM, building the CRF suffix identifying model and combination of CRF with FSM.
Keywords :
deterministic automata; finite state machines; natural language processing; random processes; CRF suffix identifying model; DFA; FSM stemming method; Uyghur language processing; Uyghur noun inflectional suffix; agglutinative language; conditional random field model; deterministic finite automaton; finite state machine; morphotactic rule; reverse order; Algorithm design and analysis; Automata; Buildings; Dictionaries; Doped fiber amplifiers; Information science; Morphology; Natural language processing; Natural languages; Statistical analysis; Ambiguous FSM; CRF; Uyghur; stemming;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Information Technology, 2009. ICCSIT 2009. 2nd IEEE International Conference on
Conference_Location :
Beijing
Print_ISBN :
978-1-4244-4519-6
Electronic_ISBN :
978-1-4244-4520-2
Type :
conf
DOI :
10.1109/ICCSIT.2009.5234727
Filename :
5234727
Link To Document :
بازگشت