Aitkin,, M., & Wilson,, G. T. (1980). Mixture models, outliers, and the EM algorithm. Technometrics, 22, 325–331.

Arias‐Castro,, E. & Lerman,, G., & Zhang,, T. (2013). Spectral clustering based on local PCA, arXiv preprint.

Azzalini,, A., & Torelli,, N. (2007). Clustering via nonparametric density estimation. Statistics and Computing, 17, 71–80.

Babaud,, J., Witkin,, A. P., Baudin,, M., & Duda,, R. O. (1994). Uniqueness of the Gaussian kernel for scale‐space filtering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 1, 26–33.

Bengio,, S., & Bengio,, Y. (2000). Taking on the curse of dimensionality in joint distributions using neural networks. IEEE Transactions on Neural Networks, 11, 550–557.

Breese,, J. S. & Heckerman,, D. & Kadie,, C. (1998). Empirical analysis of predictive algorithms for collaborative filtering. In *Proceedings of the 14th conference on Uncertainty in Artificial Intelligence (UAI’98)* (pp. 43–52). Madison, Wisconsin.

Carmichael,, J. W., George,, J. A., & Julius,, R. S. (1968). Finding natural clusters. Systematic Zoology, 17, 144–150.

Carreira‐Perpinan,, M. & Williams,, C. (2003). On the number of modes of a Gaussian mixture. In *Scale space methods in computer vision* (pp. 625–640). Springer.

Chacón,, J. E. (2018). Mixture model modal clustering. Advances in Data Analysis and Classification, 1–26.

Chacón,, J. E., & Duong,, T. (2013). Data‐driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electronic Journal of Statistics, 7, 499–532.

Chaudhuri,, P., & Marron,, J. S. (1999). SiZer for exploration of structures in curves. Journal of the American Statistical Association, 94, 807–823.

Chaudhuri,, P., & Marron,, J. S. (2000). Scale space view of curve estimation. Annals of Statistics, 28, 408–428.

Chazal,, F., Guibas,, L. J., Oudot,, S., & Skraba,, P. (2013). Persistence‐based clustering in Riemannian manifolds. Journal of the ACM, 60, 41.

Chen,, Y.‐C., Genovese,, C. R., & Wasserman,, L. (2016). A comprehensive approach to mode clustering. Electronic Journal of Statistics, 10, 210–241.

Cheng,, Y. (1995). Mean shift, mode seeking, and clustering. IEEE Transactions on Pattern Analysis and Machine Intelligence, 17, 790–799.

Chi,, E. C., Allen,, G. I., & Baraniuk,, R. G. (2017). Convex biclustering. Biometrics, 73, 10–19.

Chi,, E. C., & Lange,, K. (2015). Splitting methods for convex clustering. Journal of Computational and Graphical Statistics, 24, 994–1013.

Cilibrasi,, R., & Vitanyi,, P. M. B. (2005). Clustering by compression. IEEE Transactions on Information Theory, 51, 1523–1545.

Cleveland,, W. S. (1993). Visualizing data. Summit, NJ: Hobart Press.

Cleveland,, W. S. (1994). The elements of graphing data. Summit, NJ: Hobart Press.

Comaniciu,, D., & Meer,, P. (2002). Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24, 603–619.

Dhillon,, I. S. & Guan,, Y. & Kulis,, B. (2004). Kernel *k*‐means: Spectral clustering and normalized cuts. In *Proceedings of the tenth ACM SIGKDD international conference on knowledge discovery and data mining* (pp. 551–556).

Dinh,, L. & Sohl‐Dickstein,, J. & Bengio,, S. (2017). Density estimation using real NVP. In *Fifth international conference on learning representations*.

Dobra,, A., & Lenkoski,, A. (2011). Copula Gaussian graphical models and their application to modeling functional disability data. The Annals of Applied Statistics, 5, 969–993.

Eilers,, P. H. C., & Marx,, B. D. (1996). Flexible smoothing with B‐splines and penalties. Statistical Sciences, 11, 89–102.

Erästö,, P., & Holmström,, L. (2005). Bayesian multiscale smoothing for making inferences about features in scatterplots. Journal of Computational and Graphical Statistics, 14, 569–589.

Frey,, B. J., Hinton,, G. E., & Dayan,, P. (1996). Does the wake‐sleep algorithm learn good density estimators? Advances in Neural Information Processing Systems, 8, 661–670.

Friedman,, J. H. (1987). Exploratory projection pursuit. Journal of the American Statistical Association, 82, 249–266.

Friedman,, J. H., & Stuetzle,, W. (1981). Projection pursuit regression. Journal of the American Statistical Association, 76, 817–823.

Friedman,, J. H., Stuetzle,, W., & Schroeder,, A. (1984). Projection pursuit density estimation. Journal of the American Statistical Association, 79, 599–608.

Friedman,, J. H., & Tukey,, J. W. (1974). A projection pursuit algorithm for exploratory data analysis. IEEE Transactions on Computers, C‐23, 881–890.

Fukunaga,, F., & Hostetler,, L. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Transactions on Information Theory, 21, 31–40.

Germain,, M., Gregor,, K, Murray,, I. & Larochelle,, H. (2015). MADE: Masked autoencoder for distribution estimation. In *Proceedings of the 32nd international conference on machine learning* (pp. 881–889).

Good,, I. J., & Gaskins,, R. A. (1980). Density estimation and bump‐hunting by the penalized likelihood method exemplified by the scattering and meteorite data (with discussion). Journal of the American Statistical Association, 75, 42–73.

Goodfellow,, I., Bengio,, Y., & Courville,, A. (2016). Deep learning. MIT Press, 2016, 185–191.

Gordon,, N., Salmond,, D. J., & Smith,, A. F. M. (1993). Novel approach to nonlinear/non‐Gaussian Bayesian state estimation. IEE Proceedings on Radar and Signal Processing, 140, 107–113.

Gregor,, K., & LeCun,, Y. (2011). *Learning representations by maximizing compression*. Technical Report, arXiv pp. 1108–1169.

Hartigan,, J. A. (1975). Clustering algorithms. New York, NY: John Wiley %26 Sons.

Hathaway,, R. J. (1985). A constrained formulation of maximum‐likelihood estimation for normal mixture distributions. The Annals of Statistics, 13, 795–800.

Hjort,, N.L. (1986). On *frequency polygons and averaged shifted histograms in higher dimensions*. Technical Report 22, Stanford University.

Iwata,, T. & Yamada,, M. (2016). Multi‐view anomaly detection via robust probabilistic latent variable models. In *30th conference on neural information processing systems (NIPS 2016)*, Barcelona*,* Spain.

Izenman,, A. J. (2008). Modern multivariate statistical techniques regression, classification, and manifold learning. New York, NY: Springer.

Jee,, J. R. (1987). Exploratory projection pursuit using nonparametric density estimation. Proceedings of the Statistical Computing Section (pp. 335–339). *American Statistical Association*, Washington, D.C.

Johnson,, S. C. (1967). Hierarchical clustering schemes. Psychometrika, 32, 241–254.

Jones,, M. C., & Sibson,, R. (1987). What is projection pursuit? Journal of the Royal Statistical Society. Series A (General), 150, 1–37.

Jordan,, M. I. (2004). Graphical models. Statistical Science, 19, 140–155.

Kingma,, D. P., Salimans,, T., Jozefowicz,, R., Chen,, X., Sutskever,, I., & Welling,, M. (2016). Improved variational inference with inverse autoregressive flow. Advances in Neural Information Processing Systems, 29, 4743–4751.

Klemelä,, J. (2008). Mode trees for multivariate data. Journal of Computational and Graphical Statistics, 17, 860–869.

Klemelä,, J. (2009). Smoothing of multivariate data: Density estimation and visualization. Hoboken, NJ: John Wiley %26 Sons.

Koller,, D., & Friedman,, N. (2009). Probabilistic graphical models: Principles and techniques. MIT Press.

Kong,, A., Liu,, J. S., & Wong,, W. H. (1994). Sequential imputations and Bayesian missing data problems. Journal of the American Statistical Association, 89, 278–288.

Larochelle,, H. & Murray,, I. (2011). The neural autoregressive distribution estimator. In *Proceedings of the fourteenth international conference on artificial intelligence and statistics* (pp. 29–37).

Li,, Dangna. & Yang,, Kun. & and Wong,, Wing H. (2016). Density estimation via discrepancy based adaptive sequential partition. In *30th conference on neural information processing systems (NIPS 2016)*, Barcelona, Spain.

Li,, J., Ray,, S., & Lindsay,, B. G. (2007). A nonparametric statistical approach to clustering via mode identification. Journal of Machine Learning Research, 8, 1687–1723.

Liu,, A. Y., & Lam,, D. N. (2012). Using consensus clustering for multi‐view anomaly detection. IEEE CS Security and Privacy Workshops (pp. 117–124). San Francisco, CA.

Liu,, H., Han,, F., Yuan,, M., Lafferty,, J., & Wasserman,, L. (2012). High‐dimensional semiparametric Gaussian copula graphical models. The Annals of Statistics, 40, 2293–2326.

Liu,, J. S. (2001). Monte Carlo strategies in scientific computing, Springer series in statistics. New York, NY: Springer.

Lopez,, T. S., Brintrup,, A., Isenberg,, M.‐A., & Mansfeld,, J. (2011). Resource management in the Internet of things: Clustering, synchronisation and software agents. Architecting the Internet of Things (pp. 159–193). Springer.

Lu,, L., Jiang,, H., & Wong,, W. H. (2013). Multivariate density estimation by Bayesian sequential partitioning. Journal of the American Statistical Association, 108, 1402–1410.

MacQueen,, J.B. (1967). Some methods for classification and analysis of multivariate observations. In *Proceedings of the 5th Berkeley symposium on mathematical statistics and probability 1* (pp. 281–297). University of California Press.

Magdon‐Ismail,, M. & Atiya,, A. (1998). Neural networks for density estimation. In *NIPS* (pp. 522–528).

Mammen,, E., Marron,, J. S., & Fisher,, N. I. (1994). Asymptotics for multimodality tests based on kernel density estimates. Probability Theory and Related Fields, 91, 115–132.

Marchette,, D. J., Priebe,, C. E., Rogers,, G. W., & Wegman,, E. J. (1996). Filtered kernel density estimation. Computational Statistics, 11, 112.

Marchetti,, Y., Nguyen,, H., Braverman,, A., & Cressie,, N. (2018). Spatial data compression via adaptive dispersion clustering. Computational Statistics and Data Analysis, 117, 138–153.

Matthews,, M. V. (1983). *On Silverman`s test for the number of modes in a univariate density function*. (Honors Bachelor`s Thesis). Harvard University.

McLachlan,, G., & Krishnan,, T. (2008). The EM algorithm and extensions (2nd ed.). Hoboken, NJ: John Wiley %26 Sons.

Meila,, M., & Shi,, J. (2000). Learning segmentation by random walks. Neural Information Processing Systems, 13, 873–879.

Menardi,, G. (2016). A review on modal clustering. International Statistical Review, 84, 413–433.

Minka,, T. (2005). *Divergence measures and message passing*. MSR‐ Technical Report‐ 2005*‐*173.

Minnotte. (1997). Nonparametric testing of the existence of modes. The Annals of Statistics, 25, 1646–1660.

Minnotte,, M. C. (2010). Mode testing via higher‐order density estimation. Computational Statistics, 25, 391–407.

Minnotte,, M. C., Marchette,, D. J., & Wegman,, E. J. (1998). The bumpy road to the mode forest. Journal of Computational and Graphical Statistics, 7, 239–251.

Minnotte,, M. C., & Scott,, D. W. (1993). The mode tree: A tool for visualization of nonparametric density features. Journal of Computational and Graphical Statistics, 2, 51–68.

Müller,, D. W., & Sawitzki,, G. (1991). Excess mass estimates and tests for multimodality. Journal of the American Statistical Association, 86, 738–746.

Papamakarios,, G., Pavlakou,, T., & Murray,, I. (2017). Masked autoregressive flow for density estimation. In *31st conference on neural information processing systems (NIPS 2017)* (pp. 2335–2344). Long Beach, CA.

Pearson,, K. (1894). Contributions to the mathematical theory of evolution. Philosophical Transactions of the Royal Society London (A), 185, 71–110.

Pham,, M. C., Cao,, Y., Klamma,, R., & Jarke,, M. (2011). A clustering approach for collaborative filtering recommendation using social network analysis. Journal of Universal Computer Science, 17, 583–604.

Ray,, S., & Lindsay,, B. G. (2005). The topograph of multivariate normal mixtures. Annals of Statistics, 13, 2042–2065.

Rezende,, D. J. & Mohamed,, S. (2015). Variational inference with normalizing flows. In *Proceedings of the 32nd international conference on machine learning* (pp. 1530–1538).

Roeder,, K. (1990). Density estimation with confidence sets exemplified by superclusters and voids in the galaxies. Journal of the American Statistical Association, 85, 617–624.

Scott,, D. W. (1979). On optimal and data‐based histograms. Biometrika, 66, 605–610.

Scott,, D. W. (1985). Frequency polygons: Theory and application. Journal of the American Statistical Association, 80, 348–354.

Scott,, D. W. (2015). Multivariate density estimation: Theory, practice, and visualization (2nd ed.). Hoboken, NJ: John Wiley %26 Sons.

Scott,, D. W., & Szewczyk,, W. F. (1997). *Bumps along the road towards multivariate mode trees*. NSF workshop on bumps, jumps, clustering and discrimination May 11‐14, 1997, Houston, TX.

Scott,, D. W., Tapia,, R. A., & Thompson,, J. R. (1977). Kernel density estimation revisited. Journal of Nonlinear Analysis Theory Methods and Applications, 1, 339–372.

Shi,, J., & Malik,, J. (2000). Normalized cuts and image segmentation. IEEE Transactions on PAMI, 22(8), 159–193.

Silverman,, B. W. (1981). Using kernel density estimates to investigate multimodality. Journal of the Royal Statistical Society, 43, 97–99.

Silverman,, B. W. (1986). Density estimation for statistics and data analysis. London, England: Chapman and Hall.

Stone,, C. J. (1994). The use of polynomial splines and their tensor products in multivariate function estimation. The Annals of Statistics, 22, 118–171.

Stuetzle,, W. (2003). Estimating the cluster tree of a density by analyzing the minimal spanning tree of a sample. Journal of Classification, 20, 25–47.

Stuetzle,, W., & Nugent,, R. (2010). A generalized single linkage method for estimating the cluster of a density, *J*. Computational %26 Graphical Statistics, 19, 397–418.

Su,, X., & Khoshgoftaar,, T. M. (2009). A survey of collaborative filtering techniques. Advances in Artificial Intelligence 2009, 421425–421443.

Tasoulis,, S. K., Epitropakis,, M. G., Plagianakos,, V. P., & Tasoulis,, D. K. (2012). Density based projection pursuit clustering, *evolutionary computation (CEC)*. In *2012 IEEE congress on IEEE* (pp. 1–12).

Uria,, B., Murray,, I., & Larochelle,, H. (2014). A deep and tractable density estimator. In *Proceedings of the 31st international conference on machine learning, JMLR W%26CP***32** (pp. 467–475). Beijing, China.

Uria,, B., Cote,, M.‐A., Gregor,, K., Murray,, I., & Larochelle,, H. (2016). Neural autoregressive distribution estimation. Journal of Machine Learning Research, 17, 1–37.

Uria,, B., Murray,, I., & Larochelle,, H. (2013). RNADE: The real‐valued neural autoregressive density estimator. Advances in Neural Information Processing Systems (Vol.2, pp. 2175‐2183). Lake Tahoe, NV.

Wainwright,, M. (2008). Graphical models, exponential families, and variational inference. Foundations and Trends in Machine Learning, 1, 1–305.

Wand,, M. P. (2017). Fast approximate inference for arbitrarily large semiparametric regression models via message passing. Journal of the American Statistical Association, 112, 137–168.

Wang,, X., & Wang,, Y. (2015). Nonparametric multivariate density estimation using mixtures. Statistics and Computing, 25, 349–364.

Wasserman,, L. (2018). Topological data analysis. Annual Review of Statistics and its Application, 5, 501–532.

Wong,, W. H. (2014). Multivariate density estimation and its applications. In *Conference in honor of the 80th birthday of Professor Grace Wahba*, June 2014, Madison, WI.

Wong,, W. H., & Ma,, L. (2010). Optional Polya tree and Bayesian inference. Annals of Statistics, 38, 1433–1459.

Wong,, Y. (1993). Clustering data by melting. Neural Computation, 5, 89–104.

Yedidia,, J. S., Freeman,, W. T., & Weiss,, Y. (2003). Understanding belief propagation and its generalizations. In Exploring artificial intelligence in the new millennium (Vol. 8, pp. 239–269). San Francisco, CA: Morgan Kauffmann Publishers Inc.

Zhu,, J.‐Y., Park,, T., Isola,, P., & Efros,, A. A. (2017). Unpaired image‐to‐image translation using cycle‐consistent adversarial networks. In *IEEE international conference on computer vision (ICCV)*. Venice, Italy.