1 Kettenring, JR. Massive datasets. WIREs Comput Stat 2009, 1:25–32.
2 Huber, PJ. Massive datasets workshop: four years later. J Comput Graph Stat 1999, 8:635–652.
3 Hotz, RL. More scientists treat experiments as a team sport. Wall Street J 2009, 20:A23.
4 Knuteson, B, Padley, P. Statistical challenges with massive datasets in particle physics. J Comput Graph Stat 2003, 12:808–828.
5 McMullen, PD, Morimoto, RI, Nunes Amaral, LA. Physically grounded approach for estimating gene expression from microarray data. Proc Natl Acad Sci U S A 2010, 107:13690–13695.
6 Alter, O. Discovery of principles of nature from mathematical modeling of DNA microarray data. Proc Natl Acad Sci U S A 2006, 103:16063–16064.
7 Cho, H, Dhillon, IS. Coclustering of human cancer microarrays using minimum sum‐squared residue coclustering. IEEE/ACM Trans Comput Biol Bioinf 2008, 5:385–400.
8 Sood, R, Zehnder, JL, Druzin, ML, Brown, PO. Gene expression patterns in human placenta. Proc Natl Acad Sci U S A 2006, 103:5478–5483.
9 Chi, J‐T, Chang, HY, Haraldsen, G, Jahnsen, FL, Troyanskaya, OG, Chang, DS, Wang, Z, Rockson, SG, van de Rijn, M, Botstein, D, et al. Endothelial cell diversity revealed by global expression profiling. Proc Natl Acad Sci U S A 2003, 100:10623–10628.
10 Chang, HY, Nuyten, DSA, Sneddon, JB, Hastie, T, Tibshirani, R, Sørlie, T, Dai, H, He, YD, van`t Veer, LJ, Bartelink, H, et al. Robustness, scalability, and integration of a wound‐response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci U S A 2005, 102:3738–3743.
11 Rubins, KH, Hensley, LE, Jahrling, PB, Whitney, AR, Geisbert, TW, Huggins, JW, Owen, A, LeDuc, JW, Brown, PO, Relman, DA. The host response to smallpox: analysis of the gene expression program in peripheral blood cells in a nonhuman primate model. Proc Natl Acad Sci U S A 2004, 101:15190–15195.
12 Smyth, GK, Yang, YH, Speed, T. %22Statistical issues in cDNA microarray data analysis.%22 Functional Genomics: Methods and Protocols
. Totowa, NJ: Humana Press
; 2002, 111–136.
13 Eisen, MB, Spellman, PT, Brown, PO, Botstein, D. Cluster analysis and display of genome‐wide expression patterns. Proc Natl Acad Sci U S A 1998, 95: 14863–14868.
14 Lazzeroni, L, Owen, A. Plaid models for gene expression data. Stat Sin 2002, 12:61–86.
15 Foster, DP, Stine, RA. Variable selection in data mining: building a predictive model for bankruptcy. J Am Stat Assoc 2004, 99:303–313.
16 Zywicki, TJ. An economic analysis of the consumer bankruptcy crisis. Northwest Univ Law Rev 2005, 99:1463–1541.
17 Deerwester, S, Dumais, ST, Landauer, TK, Furnas, GW, Harshman, RA. Indexing by latent semantic analysis. J Am Soc Inf Sci 1990, 41:391–407.
18 Letsche, TA, Berry, MW. Large‐scale information retrieval with latent semantic indexing. Inf Sci 1997, 100:105–137.
19 Dhillon, IS, Modha, DS. Concept decomposition for large sparse text data using clustering. Mach Learn 2001, 42:143–175.
20 Li, J, Zha, H. Two‐way Poisson mixture models for simultaneous document classification and word clustering. Comput Stat Data Anal 2006, 50:163–180.
21 Hotelling, H. Analysis of a complex of statistical variables into principal components. J Edu Psych 1933, 24:417–441; 498–520.
22 Draper, NR, Smith, H. Applied Regression Analysis
. 3rd ed.
New York: John Wiley %26 Sons
23 George, EI. The variable selection problem. J Am Stat Assoc 2000, 95:1304–1308.
24 Hand, DJ. Discrimination and Classification
. New York: John Wiley %26 Sons
25 McKay, RJ, Campbell, NA. Variable selection techniques in discriminant analysis. I. Description. Br J Math Stat Psychol 1982, 35:1–29.
26 McKay, RJ, Campbell, NA. Variable selection techniques in discriminant analysis. II. Allocation. Br J Math Stat Psychol 1982, 35:30–41.
27 Seber, GAF. Multivariate Observations
. New York: John Wiley %26 Sons
28 Fowlkes, EB, Gnanadesikan, R, Kettenring, JR. %22Variable selection in clustering and other contexts.%22 In: Mallows, CL, ed. Design, Data, and analysis
. New York: John Wiley %26 Sons
; 1987, 13–34.
29 Tibshirani, R. Regression shrinkage and selection via the lasso. J R Stat Soc, Ser B 1996, 58:267–288.
30 Guyon, I, Elisseeff, A. An introduction to variable and feature selection. J Mach Learn Res 2003, 3: 1157–1182.
31 Chen, L, Buja, A. Local multidimensional scaling for nonlinear dimension reduction, graph drawing, and proximity analysis. J Am Stat Assoc 2009, 104: 209–219.
32 Wang, H. Forward regression for ultra‐high dimensional variable screening. J Am Stat Assoc 2009, 104: 1512–1524.
33 Meier, L, van de Geer, S, Bühlmann, P. The group lasso for logistic regression. J R Stat Soc, Ser B 2008, 70: 53–71.
34 Ishwaran, H, Kogalur, UB, Gorodeski, EZ, Minn, AJ, Lauer, MS. High‐dimensional variable selection for survival data. J Am Stat Assoc 2010, 105:205–217.
35 Chernoff, H, Lo S‐H, Zheng T. Discovering influential variables: a method of partitions. Ann Appl Stat 2009, 3:1335–1369.
36 Hall, P, Titterington, DM, Xue, J‐H. Median‐based classifiers for high‐dimensional data. J Am Stat Assoc 2009, 104:1597–1608.
37 Kursa, MB, Rudnicki, WR. Feature selection with the Boruta package. J Stat Softw 2010, 36. Available at: http://www.jstatsoft.org/
. (Accessed October 1, 2010).
38 Johnstone, IM, Lu, AY. On consistency and sparsity for principal components analysis in high dimensions. J Am Stat Assoc 2009, 104:682–693.
39 Chan, Y, Hall, P. Using evidence of mixed populations to select variables for clustering very high‐dimensional data. J Am Stat Assoc 2010, 105:798–809.
40 Hartigan, JA, Hartigan, PM. The DIP test of unimodality. Ann Stat 1985, 13:70–84.
41 Hand, DJ. Classifier technology and the illusion of progress (with discussion). Stat Sci 2006, 21:1–34.
42 Liu, H, Yu, L. Toward integrating feature selection algorithms for classification and clustering. IEEE Trans Knowl Data Eng 2005, 4:491–502.
43 Halko, N, Martinsson, PG, Tropp, JA. Finding structure with randomness: stochastic algorithms for constructing approximate matrix decompositions. Technical report 0909.4061, 2009. Available at http://arxiv.org
. (Accessed October 22, 2010).
44 Au, W‐H, Chan, KCC, Wong, AKC, Wang, W. Attribute clustering for grouping, selection, and classification of gene expression data. IEEE/ACM Trans Comput Biol Bioinf 2005, 2:83–101.
45 Mitra, P, Murthy, CA, Pal, SK. Unsupervised feature selection using feature similarity. IEEE Trans Pattern Anal Mach Intell 2002, 24:301–312.
46 Krupka, E, Tishby, N. Generalization from observed to unobserved features by clustering. J Mach Learn Res 2008, 9:339–370.