This Title All WIREs
How to cite this WIREs title:
WIREs Comp Stat

Learning under nonstationarity: covariate shift and class‐balance change

Full article on Wiley Online Library:   HTML PDF

Can't access this content? Tell your librarian.

One of the fundamental assumptions behind many supervised machine‐learning algorithms is that training and test data follow the same probability distribution. However, this important assumption is often violated in practice, for example, because of an unavoidable sample selection bias or nonstationarity of the environment. Owing to violation of the assumption, standard machine‐learning methods suffer a significant estimation bias. In this article, we consider two scenarios of such distribution change—the covariate shift where input distributions differ and class‐balance change where class‐prior probabilities vary in classification—and review semi‐supervised adaptation techniques based on importance weighting. WIREs Comput Stat 2013, 5:465–477. doi: 10.1002/wics.1275 This article is categorized under: Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition
Covariate shift. Input distributions change but the conditional distribution of outputs given inputs does not change. (a) Input densities and importance, (b) Learning target function, training samples, and test samples.
[ Normal View | Magnified View ]
Results of class‐balance adaptation. Left: Squared error of class‐balance estimation. Right: Misclassification error by a weighted ‐regularized least‐squares classifier with weighted cross‐validation. (a) Australian dataset, (b) Diabetes datatset, (c) German dataset, (d) Statlogheart dataset.
[ Normal View | Magnified View ]
Illustration of LSDD. ‘×’ denotes an estimated density difference value at xi and . (a) Data, (b) Density difference.
[ Normal View | Magnified View ]
p′(y) can be estimated by fitting a mixture of training class‐wise densities p(x|y) to test input density p′(x).
[ Normal View | Magnified View ]
Change in class balances shifts the optimal classification boundary. Class‐conditional input density is the same between the training and test phases (i.e., p(x|y) = p′(x|y)), but class‐prior probabilities are different (i.e., p(y) ≠ p′(y)). (a) Training data, (b) Test data.
[ Normal View | Magnified View ]
3D human‐pose estimation error as a function of the number of training samples averaged over all motions for each subject. The best method and comparable ones in terms of the average error according to the paired t‐test at the significance level 5% are specified by ‘○’. (a) Selection bias S1, (b) Subject transfer S1, (c) Selection bias S2, (d) Subject transfer, S2, (e) Selection bias S3, (f) Subject transfer S3.
[ Normal View | Magnified View ]
Illustration of RuLSIF. ‘×’ denotes an estimated relative importance value at xi. (a) Training and test data, (b) Relative importance weight (β = 0.5).
[ Normal View | Magnified View ]
Relative importance. p′(x) is the normal distribution with mean 0 and variance 1, and p(x) is the normal distribution with mean 0.5 and variance 1. (a) Probability densities, (b) Relative importance w(β)(x).
[ Normal View | Magnified View ]
Regression under covariate shift. Dashed lines denote learned functions. (a) Ordinary least‐squares, (b) Importance‐weighted least‐squares.
[ Normal View | Magnified View ]

Browse by Topic

Statistical Learning and Exploratory Methods of the Data Sciences > Pattern Recognition
Statistical Learning and Exploratory Methods of the Data Sciences > Clustering and Classification

Access to this WIREs title is by subscription only.

Recommend to Your
Librarian Now!

The latest WIREs articles in your inbox

Sign Up for Article Alerts