Hastie, T, Tibshirani, R, Friedman, J. The Elements of Statistical Learning. New York: Springer; 2001.

Mitchell, T. Machine Learning. New York: McGraw‐Hill; 1997.

Bishop, C. Pattern Recognition and Machine Learning. New York: Springer; 2008.

Bellman, R. Adaptive Control Processes. Princeton, NJ: Princeton University Press; 1961.

Cleveland, WS. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc 1979; 84: 829–836.

Banks, D, Olszewski, R, Maxion, R. Comparing methods for multivariate nonparametric regression. Commun Stat: Simulation Comput 2003; 32: 541–571.

Hastie, T, Tibshirani, R. Generalized Additive Models. London: Chapman and Hall; 1990.

McCullagh, P, Nelder, J. Generalized Additive Models. 2nd ed. London:Chapman and Hall; 1989.

Friedman, J, Stuetzle, W. Projection pursuit regression. J Am Stat Assoc 1981; 76: 817–823.

Chen, H. Estimation of a projection‐pursuit type regression model. Ann Stat 1991; 19: 142–157.

Barron, A. Universal approximation bounds for superpositions of a sigmoid function. IEEE Trans Information Theory 1993; 39: 930–945.

Breiman, L, Friedman, J, Olshen, R, Stone, C. Classification and Regression Trees Belmont, CA: Wadsworth; 1984.

Friedman, J. Multivariate adaptive regression splines (with discussion). Ann Stat 1991; 19: 1–141.

Tibshirani, R. Regression shrinkage and selection via the lasso. J Roy Stat Soc., Series B 1996; 58: 267–288.

Fisher, RA. The use of multiples measurements in taxonomic problems. Ann Eugenics 1936; 7: 179–188.

Raudys, S, Young, DS. Results in statistical discriminant analysis: a review of the former Soviet Union literature. J Multivariate Anal 2004; 89: 1–35.

Vapnik, V. The Nature of Statistical Learning. New York: Springer; 1996.

Cortes, C, Vapnik, V. Support‐vector networks. Mach Learn 1995; 20: 273–297.

Boser, B, Guyon, I, Vapnik, V. A training algorithm for maximum margin classifiers. Fifth Annual Workshop on Computational Learning Theory. Pittsburgh, PA: ACM; 1992; 144–152.

Breiman, L. Random forests. Mach Learn 2001; 45: 5–32.

Schapire, R. The strength of weak learnability. Mach Learn 1990; 5: 197–227.

Friedman, J, Hastie, T, Tibshirani, R. Additive logistic regression: a statistical view of boosting (with discussion). Ann Stat 2000; 28: 337–407.

Sibson, R, Jardine, N. Mathematical Taxonomy. New York: John Wiley %26 Sons; 1971.

Milligan, G, Cooper, M. An examination of procedures for determining the number of clusters in a data set. Psychometrika 1985; 50: 159–179.

Fisher, L, Van Ness, J. Admissible clustering procedures. Biometrika 1971; 58: 91–104.

Prim, RC. Shortest connection networks and some generalisations. Bell Syst Tech J 1957; 36: 1389–1401.

Friedman, J, Meulman, J. Clustering objects on subsets of variables (with discussion). J Roy Stat Soc., Series B 2004; 66: 815–849.

Inselberg, A. The plane with parallel coordinates. Visual Computer 1985; 1: 69–91.

Macqueen, JB. Some methods for classification and analysis of multivariate observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, University of California Press, 1967; 281–297.

Kohonen, T. Self‐Organization and Associative Memory. Berlin: Springer‐Verlag; 1989.

Lindsay, B. Mixture Models: Geometry, Theory, and Applications. Hayward, CA: Institute of Mathematical Statistics; 1995.

Dempster, A, Laird, N, Rubin, D. Maximum likelihood from incomplete data via the EM algorithm. J Roy Stat Soc., Series B 1977; 39: 1–22.

Banfield, JD, Raftery, A. Model‐based Gaussian and non‐Gaussian clustering. Biometrics 1992; 49: 803–821.

Donoho, D, Tanner, J. Sparse nonnegative solutions of underdetermined linear equations by linear programming. Proc Nat Acad Sci 2005; 102: 9446–9451.