An Improved Density-Based Method for Reducing Training Data in KNN

Author

Yongxia Jing ; Heping Gou ; Yaling Zhu

Author_Institution

Dept. of Inf. Technol., Qiongtai Teachers Coll., Haikou, China

fYear

2013

fDate

21-23 June 2013

Firstpage

972

Lastpage

975

Abstract

k-Nearest Neighbor (KNN) algorithm was an efficient text categorization algorithm in recall and accuracy, but the computational overhead of KNN was directly proportional to the sample size, so its classification speed was low in large-scale sample data. Aiming at this problem, the paper presented a density-based method for reducing training data, the method clustered each class of sample data into several clusters and reduced the noise sample data, and then combined some higher similar sample documents in each cluster into one document. Results of the experiment indicated that the method can reduce the computational overhead of KNN text classification, and the performance is approximately equal to those of the traditional KNN.

Keywords

pattern classification; pattern clustering; text analysis; KNN algorithm; KNN text classification; classification speed; density-based method; documents; k-nearest neighbor algorithm; large-scale sample data; noise sample data reduction; sample data clustering; sample size; text categorization algorithm; training data reduction; Algorithm design and analysis; Classification algorithms; Clustering algorithms; Noise; Support vector machine classification; Text categorization; Training; KNN text classification; samples reducing; similarity; text clustering;

fLanguage

English

Publisher

ieee

Conference_Titel

Computational and Information Sciences (ICCIS), 2013 Fifth International Conference on

Conference_Location

Shiyang

Type

conf

DOI

10.1109/ICCIS.2013.261

Filename

6643177