• DocumentCode
    2020145
  • Title

    Combining homogeneous classifiers for centroid-based text classification

  • Author

    Lertnattee, Verayuth ; Theeramunkong, Thanaruk

  • Author_Institution
    Inf. Technol. Program, Thammasart Univ., Pathumthani, Thailand
  • fYear
    2002
  • fDate
    2002
  • Firstpage
    1034
  • Lastpage
    1039
  • Abstract
    Centroid-based text classification is one of the most popular supervised approaches to classify texts into a set of pre-defined classes. Based on the vector-space model, the performance of this classification particularly depends on the way to weight and select important terms in documents for constructing a prototype class vector for each class. In the past, it was shown that term weighting using statistical term distributions could improve classification accuracy. However, for different data sets, the best weighting systems are different. Towards this problem, we propose a method that uses homogenous centroid-based classification. The effectiveness of this approach is explored using four data sets. Two main factors are taken into account: model selection and score combination. By experiments, the results show that our system can improve the classification accuracy up to 7.5-8.5% compared to k-NN classifier, 3.7-4.0% compared with the naive Bayes classifier and 1.6-2.7% over the best single-model classification method (p<0.05).
  • Keywords
    Bayes methods; classification; learning (artificial intelligence); neural nets; statistical analysis; text analysis; Bayes classifier; centroid-based text classification; classification accuracy; classification performance; data sets; documents; homogeneous classifiers; homogenous centroid-based classification; k-NN classifier; model selection; online text information; score combination; single-model classification method; statistical term distributions; supervised approach; term weighting; vector-space model; Bayesian methods; Character recognition; Classification algorithms; Frequency; Information technology; Prototypes; Statistics; Support vector machine classification; Support vector machines; Text categorization;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computers and Communications, 2002. Proceedings. ISCC 2002. Seventh International Symposium on
  • ISSN
    1530-1346
  • Print_ISBN
    0-7695-1671-8
  • Type

    conf

  • DOI
    10.1109/ISCC.2002.1021799
  • Filename
    1021799