Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Leveraging for big data regression

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Rapid advance in science and technology in the past decade brings an extraordinary amount of data, offering researchers an unprecedented opportunity to tackle complex research challenges. The opportunity, however, has not yet been fully utilized, because effective and efficient statistical tools for analyzing super‐large dataset are still lacking. One major challenge is that the advance of computing resources still lags far behind the exponential growth of database. To facilitate scientific discoveries using current computing resources, one may use an emerging family of statistical methods, called leveraging. Leveraging methods are designed under a subsampling framework, in which one samples a small proportion of the data (subsample) from the full sample, and then performs intended computations for the full sample using the small subsample as a surrogate. The key of the success of the leveraging methods is to construct nonuniform sampling probabilities so that influential data points are sampled with high probabilities. These methods stand as the very unique development of their type in big data analytics and allow pervasive access to massive amounts of information without resorting to high performance computing and cloud computing. WIREs Comput Stat 2015, 7:70–76. doi: 10.1002/wics.1324 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Dimension Reduction Statistical and Graphical Methods of Data Analysis > Sampling
A random sample of n = 500 was generated from yi = − xi + εi where xi is t(6)‐distributed, εi ∼ N(0, 1). The true regression function is in the dotted black line, the data in black circles, and the OLS estimator using the full sample in the black solid line. (a) The uniform leveraging estimator is in the green dashed line. The uniform leveraging subsample is superimposed as green crosses. (b) The weighted leveraging estimator is in the red‐dot dashed line. The points in the weighted leveraging subsample are superimposed as red crosses.
[ Normal View | Magnified View ]

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Sampling
Statistical and Graphical Methods of Data Analysis > Dimensional Reduction

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts