Author/Authors :
Lawrence E. Raffalovich، نويسنده , , Glenn D. Deane، نويسنده , , David Armstrong & Hui-Shien Tsao، نويسنده ,
Abstract :
Model selection strategies play an important, if not explicit, role in quantitative research. The inferential
properties of these strategies are largely unknown, therefore, there is little basis for recommending (or
avoiding) any particular set of strategies. In this paper, we evaluate several commonly used model selection
procedures [Bayesian information criterion (BIC), adjusted R2, Mallows’Cp, Akaike information criteria
(AIC), AICc, and stepwise regression] using Monte-Carlo simulation of model selection when the true
data generating processes (DGP) are known.
We find that the ability of these selection procedures to include important variables and exclude irrelevant
variables increases with the size of the sample and decreases with the amount of noise in the model. None of
the model selection procedures do well in small samples, even when the true DGP is largely deterministic;
thus, data mining in small samples should be avoided entirely. Instead, the implicit uncertainty in model
specification should be explicitly discussed. In large samples, BIC is better than the other procedures at
correctly identifying most of the generating processes we simulated, and stepwise does almost as well. In
the absence of strong theory, both BIC and stepwise appear to be reasonable model selection strategies
in large samples. Under the conditions simulated, adjusted R2, Mallows’ Cp AIC, and AICc are clearly
inferior and should be avoided.
Keywords :
Model selection , Stepwise regression , AIC , BIC