DocumentCode :
2281089
Title :
Champion-challenger based predictive model selection
Author :
Nath, Shyam Varan
Author_Institution :
Oracle Corp.
fYear :
2007
fDate :
22-25 March 2007
Firstpage :
254
Lastpage :
254
Abstract :
The selection of appropriate data mining predictive models is a challenging task. While it is easy to evaluate the model based on the historical data at a given point in time, using confusion matrix and misclassification rate, it is not very easy to ensure that the selected model upon deployment stays the most effective one as newer data comes in. Here we will address the issue of how to continually strive for the best model even after a predictive model is deployed for production use. In the champion-challenger based model selection paradigm, the historical data is used for creating the best or the champion predictive model using criteria like misclassification rate for a given cost matrix. Apart from the champion models, a number of other models are selected which are not as good as the champion model in predictive accuracy using same data. These models are termed as challengers to the current champion model. These models may differ from the champion model in the underlying predictive algorithm, algorithm tuning parameters or in use of model attributes. The predictive modeling starts with the conventional processes such as identifying the business problem that warrants the need for predictive modeling, finding the significant attributes for modeling, data quality analysis, followed by the actual modeling building and evaluation of the models. However, the emphasis is not at finding just the top or the champion model but to find the other models that are close in terms of model performance. The guiding principle here is that the selection of the best predictive model based on the current set of historical data, is not the stamp of approval till eternity. Real world systems that use predictive modeling are complex and dynamic processes and need to incorporate means to capture that. When the champion model is deployed in a production system and is used for predictions, these results are saved in a table. Likewise, the challenger models are also used to score a subset- - of the data and save the results. The predictions of the challenger models do not impact the real-time predictive use of the system. Based on the time intervals for future predictions, when the future time arrives, the actual results are captured for the same instances of data.
Keywords :
data mining; prediction theory; champion predictive model; champion-challenger based model selection; data mining predictive models; data quality analysis; predictive algorithm; predictive model selection; Accuracy; Costs; Data analysis; Data mining; Iterative algorithms; Prediction algorithms; Predictive models; Production systems; Real time systems; Testing;
fLanguage :
English
Publisher :
ieee
Conference_Titel :
SoutheastCon, 2007. Proceedings. IEEE
Conference_Location :
Richmond, VA
Print_ISBN :
1-4244-1028-2
Electronic_ISBN :
1-4244-1029-0
Type :
conf
DOI :
10.1109/SECON.2007.342897
Filename :
4147427
Link To Document :
بازگشت