Home
This Title All WIREs
WIREs RSS Feed
How to cite this WIREs title:
WIREs Data Mining Knowl Discov
Impact Factor: 2.541

Practical and theoretical aspects of mixture‐of‐experts modeling: An overview

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

Mixture‐of‐experts (MoE) models are a powerful paradigm for modeling data arising from complex data generating processes (DGPs). In this article, we demonstrate how different MoE models can be constructed to approximate the underlying DGPs of arbitrary types of data. Due to the probabilistic nature of MoE models, we propose the maximum quasi‐likelihood (MQL) approach as a method for estimating MoE model parameters from data, and we provide conditions under which MQL estimators are consistent and asymptotically normal. The blockwise minorization–maximization (blockwise‐MM) algorithm framework is proposed as an all‐purpose method for constructing algorithms for obtaining MQL estimators. An example derivation of a blockwise‐MM algorithm is provided. We then present a method for constructing information criteria for estimating the number of components in MoE models and provide justification for the classic Bayesian information criterion (BIC). We explain how MoE models can be used to conduct classification, clustering, and regression and illustrate these applications via two worked examples. This article is categorized under: Algorithmic Development > Statistics Technologies > Structure Discovery and Clustering
Schematic diagram of the NN architecture of a g‐component MoE model as defined by characterizations , , and
[ Normal View | Magnified View ]
Each of the segments is colored in one of colors, and are plotted as solid curves. The dotted curves visualize the mean function of the corresponding MoE model component that the segment is clustered to. The abscissa displays the time at which the signal is measured (normalized to the unit interval) and the ordinate displays the value of the signal, in Watts
[ Normal View | Magnified View ]
The original signal is plotted as a solid curve and the fitted mean function for the component MoE model, of form , is plotted as a dotted curve. The abscissa displays the time at which the signal is measured (normalized to the unit interval) and the ordinate displays the value of the signal, in Watts
[ Normal View | Magnified View ]
BIC values for g ∈ [9] obtained from the MQL estimators on the n = 550 observations sample from the time series that is displayed in Figure . The filled marker indicates the best model obtained via rule
[ Normal View | Magnified View ]
Instance of an electrical signal at a switching point, undergoing a switching operation. The abscissa displays the time at which the signal is measured (normalized to the unit interval) and the ordinate displays the value of the signal, in Watts
[ Normal View | Magnified View ]
Plot of the additional n = 2500 observations sample from the three‐class problem. The plot symbols indicate the true class labels yi, i ∈ [n]. The color indicates the classification via the fitted MoE classifier, . Here, blue, green, and red correspond to , respectively
[ Normal View | Magnified View ]
BIC values for g ∈ [9] obtained from the MQL estimators on the n = 1000 observations sample from Figure . The filled marker indicates the best model obtained via rule
[ Normal View | Magnified View ]
A realization of an n = 1000 observations sample from the three‐class problem. The plot symbols indicate the class label
[ Normal View | Magnified View ]

Browse by Topic

Technologies > Structure Discovery and Clustering
Algorithmic Development > Statistics

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts