Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Variable selection using P‐splines

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Selecting among a large set of variables those that influence most a response variable is an important problem in statistics. When the assumed regression model involves a nonparametric component, penalized regression techniques, and in particular P‐splines, are among the commonly used methods. The aim of this paper is to provide a brief review of variable selection methods using P‐splines. Starting from multiple linear regression models, with least‐squares regression, and Ridge regression, we review standard methods that perform variable selection, such as LASSO, nonnegative garrote, the SCAD method, etc. We briefly discuss a general framework of penalization and regularization methods. Going toward more flexible regression models, with some nonparametric component(s), we discuss P‐splines estimation. For some examples of flexible regression models, we then review a few variable selection methods using P‐splines. A brief discussion on grouped regularization techniques and on a robust variable selection method is given. Furthermore, we mention key ingredients in Bayesian approaches, and end the paper by drawing the attention to several other issues in variable selection with P‐splines. Throughout the paper we provide some illustrations. WIREs Comput Stat 2015, 7:1–20. doi: 10.1002/wics.1327 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Nonparametric Methods Statistical Models > Model Selection
(a) OLS estimates versus Ridge and LASSO estimates. (b) OLS estimates versus estimates using the NNG method, a method with a hard thresholding penalty and a method with the SCAD penalty.
[ Normal View | Magnified View ]
Los Angeles ozone data with outlier: estimated coefficients in function of the regularization parameter.
[ Normal View | Magnified View ]
Los Angeles ozone data without outlier: estimated coefficients in function of the regularization parameter.
[ Normal View | Magnified View ]
Los Angeles ozone data: estimated shrinkage factors in function of the regularization parameter, for OLS and P‐splines NNG respectively.
[ Normal View | Magnified View ]
Los Angeles ozone data. Partial residuals () versus the covariate values Xij, together with the fitted curve from the P‐splines estimation method (red solid curve), as well as the least‐squares fit for a linear model (the blue dashed line).
[ Normal View | Magnified View ]
Boston housing data: estimated coefficients in function of the regularization parameter. Zoom in of a selection from Figures and .
[ Normal View | Magnified View ]
Boston housing data: estimated coefficients in function of the regularization parameter, for elastic net, Bridge and NNG method.
[ Normal View | Magnified View ]
Boston housing data: estimated coefficients in function of the regularization parameter for Ridge, LASSO, and SCAD methods.
[ Normal View | Magnified View ]
(a) Examples of differentiable and non‐differentiable convex penalties (). (b) Examples of differentiable nonconvex penalties (ψ(β) = 1 − exp(−γβ2)).
[ Normal View | Magnified View ]
SCAD penalty: non‐differentiable at zero and nonconvex penalty, for three values of a.
[ Normal View | Magnified View ]

Browse by Topic

Statistical Models > Model Selection
Statistical and Graphical Methods of Data Analysis > Nonparametric Methods

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts