Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Robust linear regression for high‐dimensional data: An overview

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Digitization as the process of converting information into numbers leads to bigger and more complex data sets, bigger also with respect to the number of measured variables. This makes it harder or impossible for the practitioner to identify outliers or observations that are inconsistent with an underlying model. Classical least‐squares based procedures can be affected by those outliers. In the regression context, this means that the parameter estimates are biased, with consequences on the validity of the statistical inference, on regression diagnostics, and on the prediction accuracy. Robust regression methods aim at assigning appropriate weights to observations that deviate from the model. While robust regression techniques are widely known in the low‐dimensional case, researchers and practitioners might still not be very familiar with developments in this direction for high‐dimensional data. Recently, different strategies have been proposed for robust regression in the high‐dimensional case, typically based on dimension reduction, on shrinkage, including sparsity, and on combinations of such techniques. A very recent concept is downweighting single cells of the data matrix rather than complete observations, with the goal to make better use of the model‐consistent information, and thus to achieve higher efficiency of the parameter estimates. This article is categorized under: Statistical and Graphical Methods of Data Analysis > Robust Methods Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data Statistical and Graphical Methods of Data Analysis > Dimension Reduction
Results from least squares‐regression: fitted values versus response (left), prediction versus response (middle) and zoom (right); numbers on top are 10%‐trimmed root mean‐squared error
[ Normal View | Magnified View ]
Results from sparse partial robust M‐estimator: 10%‐trimmed root mean‐squared error from cross‐validation for all combinations of numbers of components q and sparsity parameter s (top); resulting number of nonzero coefficients for all parameter combinations
[ Normal View | Magnified View ]
Results from sparse partial robust M‐estimator: fitted values versus response (left), prediction versus response (middle) and zoom (right); numbers on top are 10%‐trimmed root mean‐squared error; color of the symbols indicates outlyingness information
[ Normal View | Magnified View ]
Results from sparse PLS: fitted values versus response (left), prediction versus response (middle) and zoom (right); numbers on top are 10%‐trimmed root mean‐squared error
[ Normal View | Magnified View ]
Results from robust Elastic Net regression: fitted values versus response (left), prediction versus response (middle) and zoom (right); numbers on top are 10%‐trimmed root mean‐squared error; color of the symbols indicates outlyingness information
[ Normal View | Magnified View ]
Results from Elastic Net regression: fitted values versus response (left), prediction versus response (middle) and zoom (right); numbers on top are 10%‐trimmed root mean‐squared error
[ Normal View | Magnified View ]
Results from MM regression: fitted values versus response (left), prediction versus response (middle) and zoom (right); numbers on top are 10%‐trimmed root mean‐squared error; color of the symbols indicates outlyingness information
[ Normal View | Magnified View ]

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Dimensional Reduction
Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data
Statistical and Graphical Methods of Data Analysis > Robust Methods

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts