Title :
An Improved Condensing Algorithm
Author :
Hao, Xiulan ; Zhang, Chenghong ; Xu, Hexiang ; Tao, Xiaopeng ; Wang, Shuyun ; Hu, Yunfa
Author_Institution :
Dept. of CIT, Fudan Univ., Shanghai
Abstract :
kNN classifier is widely used in text categorization, however, kNN has the large computational and store requirements, and its performance also suffers from uneven distribution of training data. Usually, condensing technique is resorted to reducing the noises of training data and decreasing the cost of time and space. Traditional condensing technique picks up samples in a random manner when initialization. Though random sampling is one means to reduce outliers, the extremely stochastic may lead to bad performance sometimes, that is, advantages of sampling may be suppressed. To avoid such a misfortune, we propose a variation of traditional condensing technique. Experiment results illustrate this strategy can solve above problems effectively.
Keywords :
classification; learning (artificial intelligence); neural nets; sampling methods; text analysis; condensing algorithm; kNN classifier; outlier reduction; random sampling; text categorization; training data; Conference management; Costs; Distributed computing; Electronic mail; Information science; Management training; Noise reduction; Sampling methods; Text categorization; Training data; Condensing Algorithm; Selected Seeds; Text Categorization; kNN;
Conference_Titel :
Computer and Information Science, 2008. ICIS 08. Seventh IEEE/ACIS International Conference on
Conference_Location :
Portland, OR
Print_ISBN :
978-0-7695-3131-1
DOI :
10.1109/ICIS.2008.67