• DocumentCode
    2042211
  • Title

    A new feature weighting method based on probability distribution in imbalanced text classification

  • Author

    Chu, Leilei ; Gao, Hui ; Chang, Wenbo

  • Author_Institution
    Fac. of Sci., Xi´´an Jiaotong Univ., Xi´´an, China
  • Volume
    5
  • fYear
    2010
  • fDate
    10-12 Aug. 2010
  • Firstpage
    2335
  • Lastpage
    2339
  • Abstract
    Many real-world text classification tasks involve imbalanced training examples. Categories with fewer examples are under-represented and their classifiers often perform far below satisfactory. We propose a new approach using a probability distribution to assign the feature weight and apply it to Naive Bayes classifier. The method is evaluated in our experiments on FuDan Chinese Corpus. The experimental result shows significant improvement for imbalanced datasets while the performance for balanced datasets is not jeopardized. Our approach has suggested a simple and effective solution to boost the performance of text classification over skewed datasets.
  • Keywords
    Bayes methods; pattern classification; statistical distributions; text analysis; FuDan Chinese Corpus; Naive Bayes classifier; feature weight; imbalanced text classification; probability distribution; Accuracy; Machine learning; Niobium; Probability distribution; Tagging; Text categorization; Training; Feature weighting; Imbalanced text classification; Naive Bayes; Skew;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on
  • Conference_Location
    Yantai, Shandong
  • Print_ISBN
    978-1-4244-5931-5
  • Type

    conf

  • DOI
    10.1109/FSKD.2010.5569830
  • Filename
    5569830