• DocumentCode
    2843833
  • Title

    Addressing Data-Complexity for Imbalanced Data-Sets: A Preliminary Study on the Use of Preprocessing for C4.5

  • Author

    Luengo, Julián ; Fernandez, Alicia ; Herrera, Francisco ; Herrera, Francisco

  • Author_Institution
    Dept. of Comput. Sci. & A.I., Univ. of Granada, Granada, Spain
  • fYear
    2009
  • fDate
    Nov. 30 2009-Dec. 2 2009
  • Firstpage
    523
  • Lastpage
    528
  • Abstract
    In this work we analyse the behaviour of the C4.5 classification method with respect to a bunch of imbalanced data-sets. We consider the use of two metrics of data complexity known as ¿maximum Fishers discriminant ratio¿ and ¿nonlinearity of 1NN classifier¿, to analyse the effect of preprocessing (oversampling in this case) in order to deal with the imbalance problem. In order to do that, we analyse C4.5 over a wide range of imbalanced data-sets built from real data, and try to extract behaviour patterns from the results. We obtain rules that describe both good or bad behaviours of C4.5 in the case of using the original data-sets (absence of preprocessing) and when applying preprocessing. These rules allow us to determine the effect of the use of preprocessing and to predict the response of C4.5 to preprocessing from the data-set´s complexity metrics prior to its application, and then establish when the preprocessing would be useful to.
  • Keywords
    pattern classification; 1NN classifier; C4.5 classification method; data complexity metrics; imbalanced data sets; maximum Fishers discriminant ratio; Application software; Classification tree analysis; Computer science; Data mining; Decision trees; Density measurement; Geometry; Intelligent systems; Pattern analysis; Topology; C4.5; Classification; Data complexity; Imbalanced Data-sets; Oversampling;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Intelligent Systems Design and Applications, 2009. ISDA '09. Ninth International Conference on
  • Conference_Location
    Pisa
  • Print_ISBN
    978-1-4244-4735-0
  • Electronic_ISBN
    978-0-7695-3872-3
  • Type

    conf

  • DOI
    10.1109/ISDA.2009.233
  • Filename
    5364953