This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

Bootstraps, permutation tests, and sampling orders of magnitude faster using SAS®

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

While permutation tests and bootstraps have very wide‐ranging application, both share a common potential drawback: as data‐intensive resampling methods, both can be runtime prohibitive when applied to large‐ or even medium‐sized data samples drawn from large datasets. The data explosion over the past few decades has made this a common occurrence, and it highlights the increasing need for faster, and more efficient and scalable, permutation test and bootstrap algorithms. Seven bootstrap and six permutation test algorithms coded in SAS (the largest privately owned software firm globally) are compared herein. The fastest algorithms (‘OPDY’ for the bootstrap, ‘OPDN’ for permutation tests) are new, use no modules beyond Base SAS, and achieve speed increases orders of magnitude faster than the relevant ‘built‐in’ SAS procedures (OPDY is over 200× faster than Proc SurveySelect; OPDN is over 240× faster than Proc SurveySelect, over 350× faster than NPAR1WAY (which crashes on datasets less than a 10th the size OPDN can handle), and over 720× faster than Proc Multtest). OPDY also is much faster than hashing, which crashes on datasets smaller—sometimes by orders of magnitude—than OPDY can handle. OPDY is easily generalizable to multivariate regression models, and OPDN, which uses an extremely efficient draw‐by‐draw random‐sampling‐without‐replacement algorithm, can use virtually any permutation statistic, so both have a very wide range of application. And the time complexity of both OPDY and OPDN is sublinear, making them not only the fastest but also the only truly scalable bootstrap and permutation test algorithms, respectively, in SAS. WIREs Comput Stat 2013. doi: 10.1002/wics.1266 This article is categorized under: Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling Statistical and Graphical Methods of Data Analysis > Nonparametric Methods Software for Computational Statistics > Software/Statistical Software
OPDN Real Runtime by N × n × m (N = All Strata)
[ Normal View | Magnified View ]
OPDY Real Runtime by N × n × m (N = All Strata)
[ Normal View | Magnified View ]

Browse by Topic

Statistical and Graphical Methods of Data Analysis > Bootstrap and Resampling
Statistical and Graphical Methods of Data Analysis > Nonparametric Methods
Software for Computational Statistics > Software/Statistical Software

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts