DocumentCode :
2208205
Title :
Consequences of Variability in Classifier Performance Estimates
Author :
Raeder, Troy ; Hoens, T. Ryan ; Chawla, Nitesh V.
fYear :
2010
fDate :
13-17 Dec. 2010
Firstpage :
421
Lastpage :
430
Abstract :
The prevailing approach to evaluating classifiers in the machine learning community involves comparing the performance of several algorithms over a series of usually unrelated data sets. However, beyond this there are many dimensions along which methodologies vary wildly. We show that, depending on the stability and similarity of the algorithms being compared, these sometimes-arbitrary methodological choices can have a significant impact on the conclusions of any study, including the results of statistical tests. In particular, we show that performance metrics and data sets used, the type of cross-validation employed, and the number of iterations of cross-validation run have a significant, and often predictable, effect. Based on these results, we offer a series of recommendations for achieving consistent, reproducible results in classifier performance comparisons.
Keywords :
learning (artificial intelligence); pattern classification; classifier performance estimation; machine learning; reproducibility; variability; classification; evaluation; reproducibility;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Data Mining (ICDM), 2010 IEEE 10th International Conference on
Conference_Location :
Sydney, NSW
ISSN :
1550-4786
Print_ISBN :
978-1-4244-9131-5
Electronic_ISBN :
1550-4786
Type :
conf
DOI :
10.1109/ICDM.2010.110
Filename :
5693996
Link To Document :
بازگشت