DocumentCode
179806
Title
A feature score for classifying class-imbalanced data
Author
Pramokchon, Part ; Piamsa-nga, Punpiti
Author_Institution
Dept. of Comput. Eng., Kasetsart Univ., Bangkok, Thailand
fYear
2014
fDate
July 30 2014-Aug. 1 2014
Firstpage
409
Lastpage
414
Abstract
Feature ranking method is one of filter-based feature selection which is widely used in text classification. However, many feature scores for ranking produce low classification performance when they are applied to data, where data sizes in each class are drastically different. We present a feature score based on statistical t-test technique, which is a statistical evaluation of the difference between two sample means, to assess the discriminating power of each individual feature. The t-test based feature score can be used to determine whether the numbers of data in each class are drastically unequal. Therefore, the score is insensitive to the problem of class-imbalanced distribution. The multi-class text classification performance of the proposed feature score is compared with seven modern feature scores, which are CMFS, IG, CHI, DF, GINI, OCFS, and DIA. The results show that micro average F1 performance on the Reuters-21578 benchmark dataset by the proposed feature is 94.2%, where of all other metrics are not over 80%.
Keywords
data handling; pattern classification; text analysis; class imbalanced data classification; class imbalanced distribution; feature ranking method; feature score; filter based feature selection; statistical t-test technique; text classification; Computer science; Frequency measurement; Gain measurement; Standards; Support vector machines; Text categorization; Training; feature asscessment; filter-based feature selection; imbalance class distribution; text classification;
fLanguage
English
Publisher
ieee
Conference_Titel
Computer Science and Engineering Conference (ICSEC), 2014 International
Conference_Location
Khon Kaen
Print_ISBN
978-1-4799-4965-6
Type
conf
DOI
10.1109/ICSEC.2014.6978232
Filename
6978232
Link To Document