• DocumentCode
    1909366
  • Title

    Tibetan Text Classification Based on the Feature of Position Weight

  • Author

    Hui Cao ; Huiqiang Jia

  • Author_Institution
    Chinese Nat. Inst. of Inf. Technol., Northwest Univ. for Nat., Lanzhou, China
  • fYear
    2013
  • fDate
    17-19 Aug. 2013
  • Firstpage
    220
  • Lastpage
    223
  • Abstract
    Based on the study of Tibetan characters and grammar, this paper has done research on Tibetan in the text categorization weight algorithm based on the vector space model. Comprehensively considering the position information of Tibetan which presented in the text, the paper has proposed an improved TF-IDF weighting algorithm. In the process, it has adopted χ2 (CHI) statistical methods for features on the Tibetan word document extraction and used the cosine method in Tibetan text similarity calculation to distinguish between similar documents in Tibetan. The Tibetan text classification algorithm with linear separable support vector machine classification of Tibetan texts, and finally compared the TF-IDF algorithm with the improved TF-IDF algorithm in the effects of the Tibetan text classification. Finally, it shows that the improved TF-IDF algorithm has better classification effect.
  • Keywords
    natural language processing; pattern classification; statistical analysis; support vector machines; text analysis; χ2 statistical method; CHI statistical method; TF-IDF weighting algorithm; Tibetan characters; Tibetan grammar; Tibetan position information; Tibetan text classification algorithm; Tibetan text similarity calculation; Tibetan word document extraction; cosine method; linear separable support vector machine classification; position weight feature; text categorization weight algorithm; vector space model; Classification algorithms; Feature extraction; Indexes; Support vector machine classification; Text categorization; Vectors; Feature words; Position weight; Support Vector Machine; Text classification; Tibetan;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Asian Language Processing (IALP), 2013 International Conference on
  • Conference_Location
    Urumqi
  • Type

    conf

  • DOI
    10.1109/IALP.2013.63
  • Filename
    6646041