A Tibetan web Text Clustering model

Author

Yan, Xiaodong ; Sun, Yuan ; Zhao, Xiaobing ; Yang, Guosheng

Author_Institution

School of Information Engineering, Minzu University of China, Haidian Beijing 100081, China

fYear

2010

fDate

4-6 Dec. 2010

Firstpage

3388

Lastpage

3391

Abstract

In this paper we design and implement a Tibetan Topic Detection system to process the huge number of Tibetan language text on Web. It classifies the Tibetan text into several categories, performs clustering in each category to get the topic. According to the Tibetan grammar features, we give a Tibetan text clustering model TTCM (Tibetan Text Clustering Model) for the text from Internet news sites. We have studied the Feature representation, feature extraction, and clustering in the model separately. From the performed tests, It turns out that the text Clustering in this model has a good accuracy ratio and a good recall ratio. So it has high application value.

Keywords

Accuracy; Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Partitioning algorithms; Support vector machine classification; Tibetan clustering; k-means; topic detection and tracking;

fLanguage

English

Publisher

ieee

Conference_Titel

Information Science and Engineering (ICISE), 2010 2nd International Conference on

Conference_Location

Hangzhou, China

Print_ISBN

978-1-4244-7616-9

Type

conf

DOI

10.1109/ICISE.2010.5690837

Filename

5690837

Link To Document

https://search.isc.ac/dl/search/defaultta.aspx?DTC=49&DC=2139044