Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Comp Stat

Detecting clusters in multivariate response regression

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Abstract Multivariate regression, which can also be posed as a multitask machine learning problem, is used to better understand multiple outputs based on a given set of inputs. Many methods have been proposed on how to utilize shared information about responses with applications in fields such as economics, genomics, advanced manufacturing, and precision medicine. Interest in these areas coupled with the rise of large data sets (“big data”) has generated interest in how to make the computations more efficient, but also to develop methods that account for the heterogeneity that may exist between responses. One way to exploit this heterogeneity between responses is to use methods that detect groups, also called clusters, of related responses. These methods provide a framework that can increase computational speed and account for complexity of relationships of a large number of responses. With this flexibility, comes additional challenges such as how to identify these clusters of responses, model selection, and the development of more complex algorithms that combine concepts from both the supervised and unsupervised learning literature. We explore current state of the art methods, present a framework to better understand methods that utilize or detect clusters of responses, and provide insights on the computational challenges associated with this framework. Specifically we present a simulation study that discusses the challenges with model selection when detecting clusters of responses of interest. We also comment on extensions and open problems that are of interest to both the research and practitioner communities. This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods
A visual representation of the inverse covariance matrix of the errors, Ω, of the asset return example presented in Rothman et al. (2010). The representation of a connected graph indicates there are not disjoint groups of the response based on the estimate of Ω
[ Normal View | Magnified View ]
Results from the simulations comparing tuning parameter selection methods when p = 300 over 50 replications. Panel (a) shows a comparison of MSPE on 1000 testing observations. Panel (b) shows the number of active predictors determined to be active with a maximum of 150. Panel (c) shows the number of true inactive predictors determined to be active with a maximum of 4350. Panel (d) shows cluster accuracy as established by the percent comprised out of the 50 replications for each of the four categories of cluster assignment—(a) correct cluster assignments, (b) correct number of clusters but wrong assignment, (c) too many clusters, (d) too few clusters—for methods that require clusters to be identified when fitting. Panel (e) compares the average run time to select tuning parameter for each approach
[ Normal View | Magnified View ]
Results from the simulations comparing tuning parameter selection methods when p = 12 over 50 replications. Panel (a) shows a comparison of MSPE on 1000 testing observations. Panel (b) shows the number of active predictors determined to be active with a maximum of 60. Panel (c) shows the number of true inactive predictors determined to be active with a maximum of 120. Panel (d) shows cluster accuracy as established by the percent comprised out of the 50 replications for each of the four categories of cluster assignment—(a) correct cluster assignments, (b) correct number of clusters but wrong assignment, (c) too many clusters, (d) too few clusters—for methods that require clusters to be identified when fitting. Panel (e) compares the average run time to select tuning parameter for each approach
[ Normal View | Magnified View ]
A visual representation of a possible outcome the inverse covariance matrix of the errors, Ω, of the asset return example presented in Rothman et al. (2010). The representation of a graph that is not connected, shows that two groups or clusters were identified based on an estimate of Ω
[ Normal View | Magnified View ]

Browse by Topic

Statistical Learning and Exploratory Methods of the Data Sciences > Modeling Methods
Statistical Learning and Exploratory Methods of the Data Sciences > Exploratory Data Analysis
Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts