DocumentCode :
2118148
Title :
Text Classificationg for Imbalanced Data Sets
Author :
Li, Yanling ; Zhu, Yehang ; Yang, Ping
Author_Institution :
Xi´´an Res. Inst. of Hi-Technol., Xi´´an
Volume :
2
fYear :
2008
fDate :
20-22 Dec. 2008
Firstpage :
778
Lastpage :
781
Abstract :
Imbalanced data set has caused a significant drawback of the classification performance attainable by most normal machine learning algorithm. However, the samples are often imbalanced. Therefore, how to reduce the effects of uneven distribution of training sets on text classification performance is a great challenge for machine learning on imbalanced data sets. Currently, the study on imbalaced data mainly lies in two aspects: data-level and algorithm-level. The paper focuses on the study of the three solutions: sample set restructuring, enhancement method of feature selection and weight retouch. Experimental results show that these methods are effective in improving classification performance. After comparing and analyzing the effects of these methods based on the experiments, this paper gets expressly some useful conclusions for some key issues, such as which sampling texts should be chosen and how many sampling texts should be decided for sample restructuring, how about defining separate threshold for each category in feature selection and how to adjust the weights in classification algorithm.
Keywords :
learning (artificial intelligence); pattern classification; text analysis; enhancement method; feature selection; imbalanced data sets; machine learning; sample set restructuring; text classification performance; training sets; uneven distribution; weight retouch; feature selection; imbalanced data set; re-sampling; text classificationt; weight retouch;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Information Science and Engineering, 2008. ISISE '08. International Symposium on
Conference_Location :
Shanghai
Print_ISBN :
978-1-4244-2727-4
Type :
conf
DOI :
10.1109/ISISE.2008.89
Filename :
4732504
Link To Document :
بازگشت