Title :
A Tibetan web Text Clustering model
Author :
Yan, Xiaodong ; Sun, Yuan ; Zhao, Xiaobing ; Yang, Guosheng
Author_Institution :
School of Information Engineering, Minzu University of China, Haidian Beijing 100081, China
Abstract :
In this paper we design and implement a Tibetan Topic Detection system to process the huge number of Tibetan language text on Web. It classifies the Tibetan text into several categories, performs clustering in each category to get the topic. According to the Tibetan grammar features, we give a Tibetan text clustering model TTCM (Tibetan Text Clustering Model) for the text from Internet news sites. We have studied the Feature representation, feature extraction, and clustering in the model separately. From the performed tests, It turns out that the text Clustering in this model has a good accuracy ratio and a good recall ratio. So it has high application value.
Keywords :
Accuracy; Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Partitioning algorithms; Support vector machine classification; Tibetan clustering; k-means; topic detection and tracking;
Conference_Titel :
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location :
Hangzhou, China
Print_ISBN :
978-1-4244-7616-9
DOI :
10.1109/ICISE.2010.5690837