• DocumentCode
    3010619
  • Title

    A SVM-Based Text Classification Method with SSK-Means Clustering Algorithm

  • Author

    Yan, Hongcan ; Lin, Chen ; Li, Bicheng

  • Author_Institution
    Zhengzhou Inf. Technol. Inst., Zhengzhou, China
  • Volume
    2
  • fYear
    2009
  • fDate
    7-8 Nov. 2009
  • Firstpage
    379
  • Lastpage
    383
  • Abstract
    SVM-based classification needs lots of labeled data to train classifier model, but labeling training dataset is a time-wasting and energy-wasting task. Furthermore, the feature space is sparse commonly because of text´s high dimension. All of the factors above can influence the performance of classification. We propose a SVM-based text classification with SSK-means clustering algorithm where little labeled training data are needed. In this approach, training data, including both labeled and unlabeled data, are first clustered with guidance of the labeled data. The unlabeled data samples are then labeled based on the clusters obtained. SVM classifiers can be trained with the expanded training dataset. When the training dataset has only a little labeled data, this method has better performance than SVM classifiers.
  • Keywords
    support vector machines; text analysis; SSK-means clustering algorithm; SVM-based text classification method; training data; Artificial intelligence; Classification algorithms; Clustering algorithms; Information technology; Partitioning algorithms; Support vector machine classification; Support vector machines; Testing; Text categorization; Training data; SSK-means clustering algorithm; SVM classification; labeled data;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Artificial Intelligence and Computational Intelligence, 2009. AICI '09. International Conference on
  • Conference_Location
    Shanghai
  • Print_ISBN
    978-1-4244-3835-8
  • Electronic_ISBN
    978-0-7695-3816-7
  • Type

    conf

  • DOI
    10.1109/AICI.2009.446
  • Filename
    5375806