A SVM-Based Text Classification Method with SSK-Means Clustering Algorithm

Author

Yan, Hongcan ; Lin, Chen ; Li, Bicheng

Author_Institution

Zhengzhou Inf. Technol. Inst., Zhengzhou, China

Volume

2

fYear

2009

fDate

7-8 Nov. 2009

Firstpage

379

Lastpage

383

Abstract

SVM-based classification needs lots of labeled data to train classifier model, but labeling training dataset is a time-wasting and energy-wasting task. Furthermore, the feature space is sparse commonly because of text´s high dimension. All of the factors above can influence the performance of classification. We propose a SVM-based text classification with SSK-means clustering algorithm where little labeled training data are needed. In this approach, training data, including both labeled and unlabeled data, are first clustered with guidance of the labeled data. The unlabeled data samples are then labeled based on the clusters obtained. SVM classifiers can be trained with the expanded training dataset. When the training dataset has only a little labeled data, this method has better performance than SVM classifiers.

Keywords

support vector machines; text analysis; SSK-means clustering algorithm; SVM-based text classification method; training data; Artificial intelligence; Classification algorithms; Clustering algorithms; Information technology; Partitioning algorithms; Support vector machine classification; Support vector machines; Testing; Text categorization; Training data; SSK-means clustering algorithm; SVM classification; labeled data;

fLanguage

English

Publisher

ieee

Conference_Titel

Artificial Intelligence and Computational Intelligence, 2009. AICI '09. International Conference on

Conference_Location

Shanghai

Print_ISBN

978-1-4244-3835-8

Electronic_ISBN

978-0-7695-3816-7

Type

conf

DOI

10.1109/AICI.2009.446

Filename

5375806