Selecting training instances for supervised classification

Author

Roiger, Richard ; Cornel, Lee

Author_Institution

Department of Computer and Information Sciences

fYear

1996

fDate

15-15 Nov. 1996

Firstpage

150

Lastpage

155

Abstract

Several experimental studies have tested the relative merits of various supervised machine learning models. Comparisons have been made along dimensions that include model complexity, prediction accuracy, training set size, and training time. Only limited work has been done to study the effect of training set exemplar typicality on model performance. We present experimental results obtained in testing C4.5, SX-WEB, a backpropagation newal network and linear discriminant analysis using a real-valued and a mixed form of a medical data set. We generated training sets of highly typical, widely-varied and atypical exemplars for both data sets. We tested the classification accuracy of each model using the generated training sets. Test set accuracy levels ranged between 76% and 86% when each model was trained with typical or varied training sets. The accuracy levels for C4.5, backpropagation neural net and discriminant analysis dropped significantly when atypical training sets were used. In contrast, with the exception of one test, SX-WEB was unaffected by training set choice. When comparing the correctness of each model, SX WEB showed the best overall performance. We conclude this paper with directions for future research.

Keywords

Artificial intelligence; Artificial neural networks; Backpropagation; Linear discriminant analysis; Machine learning; Medical tests; Neural networks; Predictive models; Supervised learning; Testing;

fLanguage

English

Publisher

ieee

Conference_Titel

ISAI/IFIS 1996. Mexico-USA Collaboration in Intelligent Systems Technologies. Proceedings

Conference_Location

IEEE

Print_ISBN

968-29-9437-3

Type

conf

Filename

864112