DocumentCode :
1796678
Title :
Accurate and interpretable regression trees using oracle coaching
Author :
Johansson, Ulf ; Sonstrod, Cecilia ; Konig, Rikard
Author_Institution :
Sch. of Bus. & IT, Univ. of Boras, Boras, Sweden
fYear :
2014
fDate :
9-12 Dec. 2014
Firstpage :
194
Lastpage :
201
Abstract :
In many real-world scenarios, predictive models need to be interpretable, thus ruling out many machine learning techniques known to produce very accurate models, e.g., neural networks, support vector machines and all ensemble schemes. Most often, tree models or rule sets are used instead, typically resulting in significantly lower predictive performance. The overall purpose of oracle coaching is to reduce this accuracy vs. comprehensibility trade-off by producing interpretable models optimized for the specific production set at hand. The method requires production set inputs to be present when generating the predictive model, a demand fulfilled in most, but not all, predictive modeling scenarios. In oracle coaching, a highly accurate, but opaque, model is first induced from the training data. This model (“the oracle”) is then used to label both the training instances and the production instances. Finally, interpretable models are trained using different combinations of the resulting data sets. In this paper, the oracle coaching produces regression trees, using neural networks and random forests as oracles. The experiments, using 32 publicly available data sets, show that the oracle coaching leads to significantly improved predictive performance, compared to standard induction. In addition, it is also shown that a highly accurate opaque model can be successfully used as a pre-processing step to reduce the noise typically present in data, even in situations where production inputs are not available. In fact, just augmenting or replacing training data with another copy of the training set, but with the predictions from the opaque model as targets, produced significantly more accurate and/or more compact regression trees.
Keywords :
learning (artificial intelligence); regression analysis; trees (mathematics); machine learning techniques; neural networks; oracle coaching; predictive models; random forests; regression trees; rule sets; tree models; Accuracy; Data mining; Data models; Predictive models; Production; Training data; Vegetation; Interpretable models; Oracle coaching; Predictive modeling; Regression trees;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
Computational Intelligence and Data Mining (CIDM), 2014 IEEE Symposium on
Conference_Location :
Orlando, FL
Type :
conf
DOI :
10.1109/CIDM.2014.7008667
Filename :
7008667
Link To Document :
بازگشت