• DocumentCode
    61595
  • Title

    Ranking Instances by Maximizing the Area under ROC Curve

  • Author

    Guvenir, H.A. ; Kurtcephe, M.

  • Author_Institution
    Dept. of Comput. Eng., Bilkent Univ., Ankara, Turkey
  • Volume
    25
  • Issue
    10
  • fYear
    2013
  • fDate
    Oct. 2013
  • Firstpage
    2356
  • Lastpage
    2366
  • Abstract
    In recent years, the problem of learning a real-valued function that induces a ranking over an instance space has gained importance in machine learning literature. Here, we propose a supervised algorithm that learns a ranking function, called ranking instances by maximizing the area under the ROC curve (RIMARC). Since the area under the ROC curve (AUC) is a widely accepted performance measure for evaluating the quality of ranking, the algorithm aims to maximize the AUC value directly. For a single categorical feature, we show the necessary and sufficient condition that any ranking function must satisfy to achieve the maximum AUC. We also sketch a method to discretize a continuous feature in a way to reach the maximum AUC as well. RIMARC uses a heuristic to extend this maximization to all features of a data set. The ranking function learned by the RIMARC algorithm is in a human-readable form; therefore, it provides valuable information to domain experts for decision making. Performance of RIMARC is evaluated on many real-life data sets by using different state-of-the-art algorithms. Evaluations of the AUC metric show that RIMARC achieves significantly better performance compared to other similar methods.
  • Keywords
    data mining; decision making; learning (artificial intelligence); optimisation; AUC metric; AUC value; RIMARC algorithm; ROC curve area maximization; continuous feature discretization; decision making; human-readable form; machine learning literature; necessary and sufficient condition; performance evaluation; performance measure; ranking instance space; ranking quality evaluation; real-valued function learning; receiver operating characteristic analysis; single categorical feature; supervised algorithm; Algorithm design and analysis; Machine learning; Machine learning algorithms; Measurement; Nickel; Training; Training data; Algorithm design and analysis; Machine learning; Machine learning algorithms; Measurement; Nickel; Ranking; Training; Training data; data mining; decision support; machine learning;
  • fLanguage
    English
  • Journal_Title
    Knowledge and Data Engineering, IEEE Transactions on
  • Publisher
    ieee
  • ISSN
    1041-4347
  • Type

    jour

  • DOI
    10.1109/TKDE.2012.214
  • Filename
    6338929