• DocumentCode
    179806
  • Title

    A feature score for classifying class-imbalanced data

  • Author

    Pramokchon, Part ; Piamsa-nga, Punpiti

  • Author_Institution
    Dept. of Comput. Eng., Kasetsart Univ., Bangkok, Thailand
  • fYear
    2014
  • fDate
    July 30 2014-Aug. 1 2014
  • Firstpage
    409
  • Lastpage
    414
  • Abstract
    Feature ranking method is one of filter-based feature selection which is widely used in text classification. However, many feature scores for ranking produce low classification performance when they are applied to data, where data sizes in each class are drastically different. We present a feature score based on statistical t-test technique, which is a statistical evaluation of the difference between two sample means, to assess the discriminating power of each individual feature. The t-test based feature score can be used to determine whether the numbers of data in each class are drastically unequal. Therefore, the score is insensitive to the problem of class-imbalanced distribution. The multi-class text classification performance of the proposed feature score is compared with seven modern feature scores, which are CMFS, IG, CHI, DF, GINI, OCFS, and DIA. The results show that micro average F1 performance on the Reuters-21578 benchmark dataset by the proposed feature is 94.2%, where of all other metrics are not over 80%.
  • Keywords
    data handling; pattern classification; text analysis; class imbalanced data classification; class imbalanced distribution; feature ranking method; feature score; filter based feature selection; statistical t-test technique; text classification; Computer science; Frequency measurement; Gain measurement; Standards; Support vector machines; Text categorization; Training; feature asscessment; filter-based feature selection; imbalance class distribution; text classification;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Computer Science and Engineering Conference (ICSEC), 2014 International
  • Conference_Location
    Khon Kaen
  • Print_ISBN
    978-1-4799-4965-6
  • Type

    conf

  • DOI
    10.1109/ICSEC.2014.6978232
  • Filename
    6978232