DocumentCode
2042211
Title
A new feature weighting method based on probability distribution in imbalanced text classification
Author
Chu, Leilei ; Gao, Hui ; Chang, Wenbo
Author_Institution
Fac. of Sci., Xi´´an Jiaotong Univ., Xi´´an, China
Volume
5
fYear
2010
fDate
10-12 Aug. 2010
Firstpage
2335
Lastpage
2339
Abstract
Many real-world text classification tasks involve imbalanced training examples. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We propose a new approach using a probability distribution to assign the feature weight and apply it to Naive Bayes classifier. The method is evaluated in our experiments on FuDan Chinese Corpus. The experimental result shows significant improvement for imbalanced datasets while the performance for balanced datasets is not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed datasets.
Keywords
Bayes methods; pattern classification; statistical distributions; text analysis; FuDan Chinese Corpus; Naive Bayes classifier; feature weight; imbalanced text classification; probability distribution; Accuracy; Machine learning; Niobium; Probability distribution; Tagging; Text categorization; Training; Feature weighting; Imbalanced text classification; Naive Bayes; Skew;
fLanguage
English
Publisher
ieee
Conference_Titel
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
Conference_Location
Yantai, Shandong
Print_ISBN
978-1-4244-5931-5
Type
conf
DOI
10.1109/FSKD.2010.5569830
Filename
5569830
Link To Document