• DocumentCode
    3124013
  • Title

    Confidence in Predictions from Random Tree Ensembles

  • Author

    Bhattacharyya, Siddhartha

  • Author_Institution
    Coll. of Bus. Adm., Univ. of Illinois, Chicago, IL, USA
  • fYear
    2011
  • fDate
    11-14 Dec. 2011
  • Firstpage
    71
  • Lastpage
    80
  • Abstract
    Obtaining an indication of confidence of predictions is desirable for many data mining applications. Such confidence levels, together with the predicted value, can inform on the certainty or extent of reliability that may be associated with the prediction. This can be useful, for example, where model outputs are used in making potentially costly decisions, and one may then focus on the higher confidence predictions, and in general across risk sensitive applications. The conformal prediction framework presents a novel approach for complementing predictions from machine learning algorithms with valid confidence measures. Confidence levels are obtained from the underlying algorithm, using a non-conformity measure which indicates how ´atypical´ a given example set is. The non-conformity measure is key to determining the usefulness and efficiency of the approach. This paper considers inductive conformal prediction in the context of random tree ensembles like random forests, which have been noted to perform favorably across problems. Focusing on classification tasks, and considering realistic data contexts including class imbalance, we develop non-conformity measures for assessing the confidence of predicted class labels from random forests. We examine the performance of these measures on multiple datasets. Results demonstrate the usefulness and validity of the measures, their relative differences, and highlight the effectiveness of conformal prediction random forests for obtaining predictions with associated confidence.
  • Keywords
    data mining; pattern classification; classification tasks; conformal prediction framework; data mining applications; prediction confidence; random forests; random tree ensembles; Calibration; Data mining; Extraterrestrial measurements; Predictive models; Training; Training data; Vegetation; Confidence; classification; conformal prediction; data mining; random forests;
  • fLanguage
    English
  • Publisher
    ieee
  • Conference_Titel
    Data Mining (ICDM), 2011 IEEE 11th International Conference on
  • Conference_Location
    Vancouver,BC
  • ISSN
    1550-4786
  • Print_ISBN
    978-1-4577-2075-8
  • Type

    conf

  • DOI
    10.1109/ICDM.2011.41
  • Filename
    6137211