Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Random projections: Data perturbation for classification problems

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Random projections offer an appealing and flexible approach to a wide range of large‐scale statistical problems. They are particularly useful in high‐dimensional settings, where we have many covariates recorded for each observation. In classification problems, there are two general techniques using random projections. The first involves many projections in an ensemble—the idea here is to aggregate the results after applying different random projections, with the aim of achieving superior statistical accuracy. The second class of methods include hashing and sketching techniques, which are straightforward ways to reduce the complexity of a problem, perhaps therefore with a huge computational saving, while approximately preserving the statistical efficiency. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data Statistical Models > Classification Models
Different two‐dimensional projections of 200 observations in p = 50 dimensions. Top row: three projections drawn from Haar measure; bottom row: the projections with the smallest estimate of test error out of 100 Haar projections for the linear discriminant analysis (left), QDA (middle) and knn (right) base classifiers. (Reprinted with permission from Cannings and Samworth (), Figure ). Copyright 2017, Published by John Wiley & Sons Ltd on behalf of Royal Statistical Society )
[ Normal View | Magnified View ]
The estimated test errors for the epileptic seizure recognition dataset. Left panel: n = 100. Right panel: n = 1,000
[ Normal View | Magnified View ]
The estimated test errors for the class conditional Gaussian distribution setting. Left panel: n = 100. Right panel: n = 1,000. The Bayes risk in this problem is 15.9%
[ Normal View | Magnified View ]
The average error (black) plus/minus two standard deviations (red) of over 20 sets of B1B2 projections for B1 ∈{2, …, 500} and B2 = 50. The plots show the test error for one training dataset for the LDA (left), quadratic counterpart (middle), and knn (right) projected data base classifiers. (Reprinted with permission from Cannings and Samworth (), Figure ). Copyright 2017, Published by John Wiley & Sons Ltd on behalf of Royal Statistical Society)
[ Normal View | Magnified View ]

Browse by Topic

Statistical Models > Classification Models
Statistical and Graphical Methods of Data Analysis > Analysis of High Dimensional Data
Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts