DocumentCode
2139044
Title
A Tibetan web Text Clustering model
Author
Yan, Xiaodong ; Sun, Yuan ; Zhao, Xiaobing ; Yang, Guosheng
Author_Institution
School of Information Engineering, Minzu University of China, Haidian Beijing 100081, China
fYear
2010
fDate
4-6 Dec. 2010
Firstpage
3388
Lastpage
3391
Abstract
In this paper we design and implement a Tibetan Topic Detection system to process the huge number of Tibetan language text on Web. It classifies the Tibetan text into several categories, performs clustering in each category to get the topic. According to the Tibetan grammar features, we give a Tibetan text clustering model TTCM (Tibetan Text Clustering Model) for the text from Internet news sites. We have studied the Feature representation, feature extraction, and clustering in the model separately. From the performed tests, It turns out that the text Clustering in this model has a good accuracy ratio and a good recall ratio. So it has high application value.
Keywords
Accuracy; Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Partitioning algorithms; Support vector machine classification; Tibetan clustering; k-means; topic detection and tracking;
fLanguage
English
Publisher
ieee
Conference_Titel
Information Science and Engineering (ICISE), 2010 2nd International Conference on
Conference_Location
Hangzhou, China
Print_ISBN
978-1-4244-7616-9
Type
conf
DOI
10.1109/ICISE.2010.5690837
Filename
5690837
Link To Document