• DocumentCode
    916918
  • Title

    A Kernel-Based Two-Class Classifier for Imbalanced Data Sets

  • Author

    Hong, Xia ; Chen, Sheng ; Harris, Chris J.

  • Author_Institution
    Sch. of Syst. Eng, Reading Univ.
  • Volume
    18
  • Issue
    1
  • fYear
    2007
  • Firstpage
    28
  • Lastpage
    41
  • Abstract
    Many kernel classifier construction algorithms adopt classification accuracy as performance metrics in model evaluation. Moreover, equal weighting is often applied to each data sample in parameter estimation. These modeling practices often become problematic if the data sets are imbalanced. We present a kernel classifier construction algorithm using orthogonal forward selection (OFS) in order to optimize the model generalization for imbalanced two-class data sets. This kernel classifier identification algorithm is based on a new regularized orthogonal weighted least squares (ROWLS) estimator and the model selection criterion of maximal leave-one-out area under curve (LOO-AUC) of the receiver operating characteristics (ROCs). It is shown that, owing to the orthogonalization procedure, the LOO-AUC can be calculated via an analytic formula based on the new regularized orthogonal weighted least squares parameter estimator, without actually splitting the estimation data set. The proposed algorithm can achieve minimal computational expense via a set of forward recursive updating formula in searching model terms with maximal incremental LOO-AUC value. Numerical examples are used to demonstrate the efficacy of the algorithm
  • Keywords
    least squares approximations; parameter estimation; pattern classification; sensitivity analysis; imbalanced data sets; kernel-based two-class classifier; model selection criterion; orthogonal forward selection; parameter estimation; receiver operating characteristics; regularized orthogonal weighted least squares estimation; Algorithm design and analysis; Classification algorithms; Computational efficiency; Councils; Data analysis; Kernel; Least squares approximation; Measurement; Parameter estimation; Support vector machines; Forward selection; imbalanced data sets; kernel classifier; leave-one-out (LOO) cross validation; receiver operating characteristics (ROCs); Algorithms; Artificial Intelligence; Cluster Analysis; Computer Simulation; Computing Methodologies; Databases, Factual; Information Storage and Retrieval; Pattern Recognition, Automated;
  • fLanguage
    English
  • Journal_Title
    Neural Networks, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1045-9227
  • Type

    jour

  • DOI
    10.1109/TNN.2006.882812
  • Filename
    4049823