Bootstrap model selection had similar performance for selecting authentic and noise variables compared to backward variable elimination: a simulation study
Researchers have proposed using bootstrap resampling in conjunction with automated variable selection methods to identify predictors of an outcome and to develop parsimonious regression models. Using this method, multiple bootstrap samples are drawn from the original data set. Traditional backward variable elimination is used in each bootstrap sample, and the proportion of bootstrap samples in which each candidate variable is identified as an independent predictor of the outcome is determined. The performance of this method for identifying predictor variables has not been examined. Monte Carlo simulation methods were used to determine the ability of bootstrap model selection methods to correctly identify predictors of an outcome when those variables that are selected for inclusion in at least 50% of the bootstrap samples are included in the final regression model. We compared the performance of the bootstrap model selection method to that of conventional backward variable elimination. Bootstrap model selection tended to result in an approximately equal proportion of selected models being equal to the true regression model compared with the use of conventional backward variable elimination. Bootstrap model selection performed comparatively to backward variable elimination for identifying the true predictors of a binary outcome.