Title :
Study on automatic extraction method of Tibetan new words
Author :
Yuan Sun ; Xiaodong Yan ; Xiaobing Zhao ; GuoSheng Yang
Author_Institution :
Sch. of Inf. Eng., Minzu Univ. of China, Beijing, China
Abstract :
This paper proposes a model to automatically extract Tibetan new words. Through building the dynamic Tibetan corpus from 2009 to 2012, which covers more than 18 Tibetan network media of Tibet, Qinghai, Sichuan, Gansu and Yunnan, we research on the key techniques of Tibetan new word extraction: (1) using statistical method to establish Tibetan new words knowledge base; (2) using information entropy and vector space module similarity calculation to extract/filter Tibetan new valid words; (3) using word co-occurrence techniques to extract Tibetan new meaning words.
Keywords :
entropy; natural language processing; statistical analysis; vectors; word processing; Gansu; Qinghai; Sichuan; Tibet; Tibetan corpus; Tibetan new word; Yunnan; automatic extraction method; information entropy; statistical method; vector space module similarity calculation; word co-occurrence technique; Telecommunications; Tibetan new words; dynamic Tibetan corpus; extraction;
Conference_Titel :
Computing and Networking Technology (ICCNT), 2012 8th International Conference on
Conference_Location :
Gueongju
Print_ISBN :
978-1-4673-1326-1