Text document pre-processing with the KNN for classification using the SVM

Author

Gayathri, K. ; Marimuthu, A.

Author_Institution

Department of Computer Science, Nirmala College of Arts and Science, Coimbatore, India

fYear

2013

Firstpage

453

Lastpage

457

Abstract

Document classification can be defined as the task of automatically categorizing collections of electronic documents into their annotated classes, based on their contents. In recent years this has become important due to the advent of large amount of data in digital form. For several decades now document classification in the form of text classification systems have been widely implemented in numerous applications such as spam filtering, e-mails, knowledge repositories and ontology mapping. The main objective is to propose a text classification based on the feature selection and preprocessing there by reducing the dimensionality of the feature vector and increase the classification accuracy. We study the advantages of and disadvantages of K-nearest neighbor (KNN) classification and Support Vector Machine (SVM)classification in performing their classification tasks. In our investigation, we found that the well-performing KNN classification approach may suffer from less accurate than the SVM classification.

Keywords

Accuracy; Corporate acquisitions; Marine vehicles; Support vector machines; Feature Selection; K-Nearest Neighbor; Support Vector Machine; Text Classification;

fLanguage

English

Publisher

ieee

Conference_Titel

Intelligent Systems and Control (ISCO), 2013 7th International Conference on

Conference_Location

Coimbatore, Tamil Nadu, India

Print_ISBN

978-1-4673-4359-6

Type

conf

DOI

10.1109/ISCO.2013.6481197

Filename

6481197