• DocumentCode
    477755
  • Title

    A Hybrid Re-sampling Method for SVM Learning from Imbalanced Data Sets

  • Author

    Li, Peng ; Qiao, Pei-Li ; Liu, Yuan-Chao

  • Volume
    2
  • fYear
    2008
  • fDate
    18-20 Oct. 2008
  • Firstpage
    65
  • Lastpage
    69
  • Abstract
    Support vector machine (SVM) has been widely studied and shown success in many application fields. However, the performance of SVM drops significantly when it is applied to the problem of learning from imbalanced data sets in which negative instances greatly outnumber the positive instances. This paper analyzes the intrinsic factors behind this failure and proposes a suitable re-sampling method. We re-sample the imbalance data by using variable SOM clustering so as to overcome the flaws of the traditional re-sampling methods, such as serious randomness, subjective interference and information loss. Then we prune the training set by means of K-NN rule to solve the problem of data confusion, which improves the generalization ability of SVM. Experiment results show that our method obviously improves the performance of the SVM on imbalanced data sets.
  • Keywords
    data handling; pattern clustering; self-organising feature maps; support vector machines; K-NN rule; SVM learning; data confusion; hybrid re-sampling method; imbalanced data sets; information loss; subjective interference; support vector machine; variable SOM clustering; Application software; Computer science; Educational institutions; Failure analysis; Fuzzy systems; Interference; Intrusion detection; Machine learning; Machine learning algorithms; Support vector machines; Imbalanced data sets; Re-sampling; Support Vector Machine;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Fuzzy Systems and Knowledge Discovery, 2008. FSKD '08. Fifth International Conference on
  • Conference_Location
    Shandong
  • Print_ISBN
    978-0-7695-3305-6
  • Type

    conf

  • DOI
    10.1109/FSKD.2008.407
  • Filename
    4666081