Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Variable selection in the presence of missing data: imputation‐based methods

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Variable selection plays an essential role in regression analysis as it identifies important variables that are associated with outcomes and is known to improve predictive accuracy of resulting models. Variable selection methods have been widely investigated for fully observed data. However, in the presence of missing data, methods for variable selection need to be carefully designed to account for missing data mechanisms and statistical techniques used for handling missing data. Since imputation is arguably the most popular method for handling missing data due to its ease of use, statistical methods for variable selection that are combined with imputation are of particular interest. These methods, valid and used under the assumptions of missing at random and missing completely at random, largely fall into three general strategies. The first strategy applies existing variable selection methods to each imputed dataset and then combines variable selection results across all imputed datasets. The second strategy applies existing variable selection methods to stacked imputed datasets. The third variable selection strategy combines resampling techniques such as bootstrap with imputation. Despite recent advances, this area remains under‐developed and offers fertile ground for further research. WIREs Comput Stat 2017, 9:e1402. doi: 10.1002/wics.1402 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts