• DocumentCode
    2139044
  • Title

    A Tibetan web Text Clustering model

  • Author

    Yan, Xiaodong ; Sun, Yuan ; Zhao, Xiaobing ; Yang, Guosheng

  • Author_Institution
    School of Information Engineering, Minzu University of China, Haidian Beijing 100081, China
  • fYear
    2010
  • fDate
    4-6 Dec. 2010
  • Firstpage
    3388
  • Lastpage
    3391
  • Abstract
    In this paper we design and implement a Tibetan Topic Detection system to process the huge number of Tibetan language text on Web. It classifies the Tibetan text into several categories, performs clustering in each category to get the topic. According to the Tibetan grammar features, we give a Tibetan text clustering model TTCM (Tibetan Text Clustering Model) for the text from Internet news sites. We have studied the Feature representation, feature extraction, and clustering in the model separately. From the performed tests, It turns out that the text Clustering in this model has a good accuracy ratio and a good recall ratio. So it has high application value.
  • Keywords
    Accuracy; Classification algorithms; Clustering algorithms; Clustering methods; Feature extraction; Partitioning algorithms; Support vector machine classification; Tibetan clustering; k-means; topic detection and tracking;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Information Science and Engineering (ICISE), 2010 2nd International Conference on
  • Conference_Location
    Hangzhou, China
  • Print_ISBN
    978-1-4244-7616-9
  • Type

    conf

  • DOI
    10.1109/ICISE.2010.5690837
  • Filename
    5690837