DocumentCode :
179806
Title :
A feature score for classifying class-imbalanced data
Author :
Pramokchon, Part ; Piamsa-nga, Punpiti
Author_Institution :
Dept. of Comput. Eng., Kasetsart Univ., Bangkok, Thailand
fYear :
2014
fDate :
July 30 2014-Aug. 1 2014
Firstpage :
409
Lastpage :
414
Abstract :
Feature ranking method is one of filter-based feature selection which is widely used in text classification. However, many feature scores for ranking produce low classification performance when they are applied to data, where data sizes in each class are drastically different. We present a feature score based on statistical t-test technique, which is a statistical evaluation of the difference between two sample means, to assess the discriminating power of each individual feature. The t-test based feature score can be used to determine whether the numbers of data in each class are drastically unequal. Therefore, the score is insensitive to the problem of class-imbalanced distribution. The multi-class text classification performance of the proposed feature score is compared with seven modern feature scores, which are CMFS, IG, CHI, DF, GINI, OCFS, and DIA. The results show that micro average F1 performance on the Reuters-21578 benchmark dataset by the proposed feature is 94.2%, where of all other metrics are not over 80%.
Keywords :
data handling; pattern classification; text analysis; class imbalanced data classification; class imbalanced distribution; feature ranking method; feature score; filter based feature selection; statistical t-test technique; text classification; Computer science; Frequency measurement; Gain measurement; Standards; Support vector machines; Text categorization; Training; feature asscessment; filter-based feature selection; imbalance class distribution; text classification;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computer Science and Engineering Conference (ICSEC), 2014 International
Conference_Location :
Khon Kaen
Print_ISBN :
978-1-4799-4965-6
Type :
conf
DOI :
10.1109/ICSEC.2014.6978232
Filename :
6978232
Link To Document :
بازگشت