DocumentCode :
2042211
Title :
A new feature weighting method based on probability distribution in imbalanced text classification
Author :
Chu, Leilei ; Gao, Hui ; Chang, Wenbo
Author_Institution :
Fac. of Sci., Xi´´an Jiaotong Univ., Xi´´an, China
Volume :
5
fYear :
2010
fDate :
10-12 Aug. 2010
Firstpage :
2335
Lastpage :
2339
Abstract :
Many real-world text classification tasks involve imbalanced training examples. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We propose a new approach using a probability distribution to assign the feature weight and apply it to Naive Bayes classifier. The method is evaluated in our experiments on FuDan Chinese Corpus. The experimental result shows significant improvement for imbalanced datasets while the performance for balanced datasets is not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed datasets.
Keywords :
Bayes methods; pattern classification; statistical distributions; text analysis; FuDan Chinese Corpus; Naive Bayes classifier; feature weight; imbalanced text classification; probability distribution; Accuracy; Machine learning; Niobium; Probability distribution; Tagging; Text categorization; Training; Feature weighting; Imbalanced text classification; Naive Bayes; Skew;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location :
Yantai, Shandong
Print_ISBN :
978-1-4244-5931-5
Type :
conf
DOI :
10.1109/FSKD.2010.5569830
Filename :
5569830
Link To Document :
بازگشت