DocumentCode :
2561784
Title :
Technology Research of Tibetan Hot Topics Extraction
Author :
Guixian Xu ; Lirong Qiu
Author_Institution :
Sch. of Inf. Eng., Minzu Univ. of China, Beijing, China
fYear :
2015
fDate :
24-27 March 2015
Firstpage :
204
Lastpage :
208
Abstract :
With the increase of a large numbers of Tibetan information, Tibetan text processing has become popular and important. Tibetan hot topics extraction has become one of the Tibetan information analysis tools. This paper describes a method of the hot topics extraction from Tibetan text. First, construction of the dataset is described. Second, Tibetan word segmentation is presented. Third, the feature selection and the text representation are conducted. The classical TFIDF is used to calculate the weights of features. At last, statistical-based method is utilized to extract the hot topics. The experiment shows it can extract the topics effectively and the results can reflect the characteristics of hot topic category. It is helpful and meaningful for text classification, information retrieval as well as construction of high-quality corpus.
Keywords :
feature selection; information retrieval; linguistics; natural language processing; pattern classification; statistical analysis; text analysis; word processing; TFIDF; Tibetan hot topic extraction; Tibetan information analysis tools; Tibetan text processing; Tibetan word segmentation; feature selection; high-quality corpus; information retrieval; statistical-based method; text classification; text representation; Feature extraction; Information processing; Information retrieval; Monitoring; Text categorization; XML; TFIDF weighting calculation; feature selection; hot topic extraction; tibetan information processing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Advanced Information Networking and Applications Workshops (WAINA), 2015 IEEE 29th International Conference on
Conference_Location :
Gwangiu
Print_ISBN :
978-1-4799-1774-7
Type :
conf
DOI :
10.1109/WAINA.2015.17
Filename :
7096173
Link To Document :
بازگشت