Consequences of Variability in Classifier Performance Estimates

Author

Raeder, Troy ; Hoens, T. Ryan ; Chawla, Nitesh V.

fYear

2010

fDate

13-17 Dec. 2010

Firstpage

421

Lastpage

430

Abstract

The prevailing approach to evaluating classifiers in the machine learning community involves comparing the performance of several algorithms over a series of usually unrelated data sets. However, beyond this there are many dimensions along which methodologies vary wildly. We show that, depending on the stability and similarity of the algorithms being compared, these sometimes-arbitrary methodological choices can have a significant impact on the conclusions of any study, including the results of statistical tests. In particular, we show that performance metrics and data sets used, the type of cross-validation employed, and the number of iterations of cross-validation run have a significant, and often predictable, effect. Based on these results, we offer a series of recommendations for achieving consistent, reproducible results in classifier performance comparisons.

Keywords

learning (artificial intelligence); pattern classification; classifier performance estimation; machine learning; reproducibility; variability; classification; evaluation; reproducibility;

fLanguage

English

Publisher

ieee

Conference_Titel

Data Mining (ICDM), 2010 IEEE 10th International Conference on

Conference_Location

Sydney, NSW

ISSN

1550-4786

Print_ISBN

978-1-4244-9131-5

Electronic_ISBN

1550-4786

Type

conf

DOI

10.1109/ICDM.2010.110

Filename

5693996