DocumentCode :
2139044
Title :
A Tibetan web Text Clustering model
Author :
Yan, Xiaodong ; Sun, Yuan ; Zhao, Xiaobing ; Yang, Guosheng
Author_Institution :
School of Information Engineering, Minzu University of China, Haidian Beijing 100081, China
fYear :
2010
fDate :
4-6 Dec. 2010
Firstpage :
3388
Lastpage :
3391
Abstract :
In this paper we design and implement a Tibetan Topic Detection system to process the huge number of Tibetan language text on Web. It classifies the Tibetan text into several categories, performs clustering in each category to get the topic. According to the Tibetan grammar features, we give a Tibetan text clustering model TTCM (Tibetan Text Clustering Model) for the text from Internet news sites. We have studied the Feature representation, feature extraction, and clustering in the model separately. From the performed tests, It turns out that the text Clustering in this model has a good accuracy ratio and a good recall ratio. So it has high application value.
Keywords :
Accuracy; Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Partitioning algorithms; Support vector machine classification; Tibetan clustering; k-means; topic detection and tracking;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
Type :
conf
DOI :
10.1109/ICISE.2010.5690837
Filename :
5690837
Link To Document :
بازگشت