DocumentCode :
2335965
Title :
A simple KNN algorithm for text categorization
Author :
Soucy, Pascal ; Mineau, Guy W.
Author_Institution :
Dept. of Comput. Sci., Laval Univ., Que., Canada
fYear :
2001
fDate :
2001
Firstpage :
647
Lastpage :
648
Abstract :
Text categorization (also called text classification) is the process of identifying the class to which a text document belongs. This paper proposes to use a simple non-weighted features KNN algorithm for text categorization. We propose to use a feature selection method that finds the relevant features for the learning task at hand using feature interaction (based on word interdependencies). This will allow us to reduce considerably the number Of selected features from which to learn, making our KNN algorithm applicable in contexts where both the volume of documents and the size of the vocabulary are high, like with the World Wide Web. Therefore, the KNN algorithm that we propose becomes efficient for classifying text documents in that context (in terms of its predictability and interpretability), as is demonstrated. Its simplicity (WRT its implementation and fine-tuning) becomes its main assets for in-the-field applications
Keywords :
classification; feature extraction; text analysis; World Wide Web; feature interaction; feature selection method; learning task; nonweighted features KNN algorithm; text categorization; text classification; text document; word interdependencies; Computer science; Frequency conversion; Solids; Testing; Text categorization; Unsolicited electronic mail; Vocabulary; Web sites;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on
Conference_Location :
San Jose, CA
Print_ISBN :
0-7695-1119-8
Type :
conf
DOI :
10.1109/ICDM.2001.989592
Filename :
989592
Link To Document :
بازگشت