Vapnik, VN. Statistical Learning Theory. New York, NY: Wiley; 1998.

Hastie, T, Tibshirani, R, Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. New York, NY: Springer; 2001.

Bishop, CM. Pattern Recognition and Machine Learning. New York, NY: Springer; 2006.

Shimodaira, H. Improving predictive inference under covariate shift by weighting the log‐likelihood function. J Stat Plann Inference 2000, 90:227–244.

Sugiyama, M, Kawanabe, M. Machine Learning in Non‐Stationary Environments: Introduction to Covariate Shift Adaptation. Cambridge, MA: MIT Press; 2012.

Saerens, M, Latinne, P, Decaestecker, C. Adjusting the outputs of a classifier to new a priori probabilities: a simple procedure. Neural Comput 2002, 14:21–41.

du Plessis M C, Sugiyama M. Semi‐supervised learning of class balance under class‐prior change by distribution matching. In: Langford J, Pineau J, eds. *Proceedings of 29th International Conference on Machine Learning (ICML2012)*, Edinburgh, Scotland, June 26–July 1, 2012, 823–830.

Chapelle, O, Schölkopf, B, Zien, A, eds. Semi‐Supervised Learning. Cambridge, MA: MIT Press; 2006.

Wiens, DP. Robust weights and designs for biased regression models: least squares and generalized M‐estimation. J Stat Plann Inference 2000, 83:395–412.

Kanamori, T, Shimodaira, H. Active learning algorithm using the maximum weighted log‐likelihood estimator. J Stat Plann Inference 2003, 116:149–162.

Sugiyama, M. Active learning in approximately linear regression based on conditional expectation of generalization error. J Mach Learn Res 2006, 7:141–166.

Kanamori, T. Pool‐based active learning with optimal sampling distribution and its information geometrical interpretation. Neurocomputing 2007, 71:353–362.

Sugiyama, M, Rubens, N. A batch ensemble approach to active learning with model selection. Neural Netw 2008, 21:1278–1286.

Sugiyama, M, Nakajima, S. Pool‐based active learning in approximate linear regression. Mach Learn 2009, 75:249–274.

Ćwik, J, Mielniczuk, J. Estimating density ratio with application to discriminant analysis. Commun Stat: Theory Methods 1989, 18:3057–3069.

Chen, S‐M, Hsu, Y‐S, Liaw, J‐T. On kernel estimators of density ratio. Statistics 2009, 43:463–479.

Qin, J. Inferences for case‐control and semiparametric two‐sample density ratio models. Biometrika 1998, 85:619–630.

Cheng, KF, Chu, CK. Semiparametric density estimation under a two‐sample density ratio model. Bernoulli 2004, 10:583–604.

Bickel S, Brückner M, Scheffer T. Discriminative learning for differing training and test distributions. In: *Proceedings of the 24th International Conference on Machine Learning (ICML2007)*; 2007, 81–88.

Gretton, A, Smola, A, Huang, J, Schmittfull, M, Borgwardt, K, Schölkopf, B. Covariate shift by kernel mean matching. In: Quiñonero‐Candela, J, Sugiyama, M, Schwaighofer, A, Lawrence, N, eds. Dataset Shift in Machine Learning. Cambridge, MA: MIT Press; 2009, 131–160.

Kanamori, T, Suzuki, T, Sugiyama, M. Statistical analysis of kernel‐based least‐squares density‐ratio estimation. Mach Learn 2012, 86:335–367.

Vapnik VN, Braga I, Izmailov R. Constructive setting of the density ratio estimation problem and its rigorous solution. Technical Report 1306.0407, arXiv, 2013.

Que Q, Belkin M. Inverse density as an inverse problem: the Fredholm equation approach. Technical Report 1304.5575, arXiv, 2013.

Sugiyama, M, Suzuki, T, Nakajima, S, Kashima, H, von Bünau, P, Kawanabe, M. Direct importance estimation for covariate shift adaptation. Ann Inst Stat Math 2008, 60:699–746.

Nguyen, X, Wainwright, MJ, Jordan, MI. Estimating divergence functionals and the likelihood ratio by convex risk minimization. IEEE Trans Inf Theory 2010, 56:5847–5861.

Tsuboi, Y, Kashima, H, Hido, S, Bickel, S, Sugiyama, M. Direct density ratio estimation for large‐scale covariate shift adaptation. J Inf Process 2009, 17:138–155.

Yamada, M, Sugiyama, M. Direct importance estimation with Gaussian mixture models. IEICE Trans Inf Syst 2009, E92‐D:2159–2162.

Yamada, M, Sugiyama, M, Wichern, G, Simm, J. Direct importance estimation with a mixture of probabilistic principal component analyzers. IEICE Trans Inf Syst 2010, E93‐D:2846–2849.

Kanamori, T, Hido, S, Sugiyama, M. A least‐squares approach to direct importance estimation. J Mach Learn Res Jul. 2009, 10:1391–1445.

Sugiyama, M, Suzuki, T, Kanamori, T. Density ratio matching under the Bregman divergence: a unified framework of density ratio estimation. Ann Inst Stat Math 2012, 64:1009–1044.

Kanamori, T, Suzuki, T, Sugiyama, M. Computational complexity of kernel‐based density‐ratio estimation: A condition number analysis. Mach Learn 2013, 90:431–460.

Sugiyama, M, Kawanabe, M, Chui, PL. Dimensionality reduction for density ratio estimation in high‐dimensional spaces. Neural Netw 2010, 23:44–59.

Sugiyama, M, Yamada, M, von Bünau, P, Suzuki, T, Kanamori, T, Kawanabe, M. Direct density‐ratio estimation with dimensionality reduction via least‐squares hetero‐distributional subspace search. Neural Netw 2011, 24:183–198.

Yamada M, Sugiyama M. Direct density‐ratio estimation with dimensionality reduction via hetero‐distributional subspace analysis. In: *Proceedings of the Twenty‐Fifth AAAI Conference on Artificial Intelligence (AAAI2011)*, San Francisco, California, USA, The AAAI Press, August 7–11, 2011, 549–554.

Sugiyama, M, Suzuki, T, Kanamori, T. Density Ratio Estimation in Machine Learning. Cambridge, UK: Cambridge University Press; 2012.

Yamada, M, Suzuki, T, Kanamori, T, Hachiya, H, Sugiyama, M. Relative density‐ratio estimation for robust distribution comparison. Neural Comput 2013, 25:1324–1370.

Akaike, H. A new look at the statistical model identification. IEEE Trans Autom Control 1974, AC‐19:716–723.

Sugiyama, M, Ogawa, H. Subspace information criterion for model selection. Neural Comput 2001, 13:1863–1889.

Stone, M. Cross‐validatory choice and assessment of statistical predictions. J Roy Stat Soc Ser B 1974, 36:111–147.

Sugiyama, M, Müller, K‐R. Input‐dependent estimation of generalization error under covariate shift. Stat Decis 2005, 23:249–279.

Sugiyama, M, Krauledat, M, Müller, K‐R. Covariate shift adaptation by importance weighted cross validation. J Mach Learn Res May 2007, 8:985–1005.

Li, Y, Kambara, H, Koike, Y, Sugiyama, M. Application of covariate shift adaptation techniques in brain computer interfaces. IEEE Trans Biomed Eng 2010, 57:1318–1324.

Hachiya, H, Akiyama, T, Sugiyama, M, Peters, J. Adaptive importance sampling for value function approximation in off‐policy reinforcement learning. Neural Netw 2009, 22:1399–1410.

Akiyama, T, Hachiya, H, Sugiyama, M. Efficient exploration through active learning for value function approximation in reinforcement learning. Neural Netw 2010, 23:639–648.

Hachiya, H, Peters, J, Sugiyama, M. Reward weighted regression with sample reuse. Neural Comput 2011, 11:2798–2832.

Zhao, T, Hachiya, H, Tangkaratt, V, Morimoto, J, Sugiyama, M. Efficient sample reuse in policy gradients with parameter‐based exploration. Neural Comput 2013, 25:1512–1547.

Yamada, M, Sugiyama, M, Matsui, T. Semi‐supervised speaker identification under covariate shift. Signal Process 2010, 90:2353–2361.

Ueki, K, Sugiyama, M, Ihara, Y. Lighting condition adaptation for perceived age estimation. IEICE Trans Inf Syst 2011, E94‐D:392–395.

Hachiya, H, Sugiyama, M, Ueda, N. Importance‐weighted least‐squares probabilistic classifier for covariate shift adaptation with application to human activity recognition. Neurocomputing 2012, 80:93–101.

Bickel, S, Scheffer, T. Dirichlet‐enhanced spam filtering based on biased samples. In: Schölkopf, B, Platt, J, Hoffman, T, eds. Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press; 2007, 161–168.

Bickel, S, Sawade, C, Scheffer, T. Transfer learning by distribution matching for targeted advertising. In: Koller, D, Schuurmans, D, Bengio, Y, Bottou, L, eds. Advances in Neural Information Processing Systems. 2009, 145–152.

Bickel S, Bogojeska J, Lengauer T, Scheffer T. Multi‐task learning for HIV therapy screening. In: McCallum A, Roweis S, eds. *Proceedings of 25th Annual International Conference on Machine Learning (ICML2008)*; 2008, 56–63.

Yamada M, Sigal L, Raptis M. No bias left behind: covariate shift adaptation for discriminative 3D pose estimation. In: *Proceedings of European Conference on Computer Vision (ECCV2012)*, 2012, 674–687.

Sigal L, Black MJ. Humaneva: synchronized video and motion capture dataset for evaluation of articulated human motion. Technical Report CS‐06‐08, Brown University, 2006.

Bo, L, Sminchisescu, C. Twin Gaussian processes for structured prediction. Int J Comput Vis 2010, 87:28–52.

Agarwal A, Triggs B. Monocular human motion capture with a mixture of regressors. In *Proceedings of IEEE Workshop on Vision for Human Computer Interaction at Computer Vision and Pattern Recognition*. 2005, p 72.

Shakhnarovich G, Viola P, Darrell T. Fast pose estimation with parameter‐sensitive hashing. In: *Proceedings of International Conference on Computer Vision (ICCV2003)*, vol. 2. 2003, 750–757.

Kullback, S, Leibler, RA. On information and sufficiency. Ann Math Stat 1951, 22:79–86.

Pearson, K. On the criterion that a given system of deviations from the probable in the case of a correlated system of variables is such that it can be reasonably supposed to have arisen from random sampling. Philos Mag Ser 1900, 50:157–175.

Sugiyama M, Suzuki T, Kanamori T, du Plessis MC, Liu S, Takeuchi I. Density‐difference estimation. *Neural Comput* 2013. In press.

Hall, P. On the non‐parametric estimation of mixture proportions. J Roy Stat Soc Ser B 1981, 43:147–156.

Titterington, DM. Minimum distance non‐parametric estimation of mixture proportions. J Roy Stat Soc Ser B 1983, 45:37–46.

Hall, P, Wand, MP. On nonparametric discrimination using density differences. Biometrika 1988, 75:541–547.

Anderson, N, Hall, P, Titterington, D. Two‐sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel‐based density estimates. J Multivar Anal 1994, 50:41–54.

Kim, J, Scott, C. L_{2} kernel classification. IEEE Trans Pattern Anal Mach Intell 2010, 32:1822–1831.

Asuncion A, Newman DJ. UCI machine learning repository, 2007. Available at: http://archive.ics.uci.edu/ml/ (Accessed August 22 2013).

Härdle, W, Müller, M, Sperlich, S, Werwatz, A. Nonparametric and Semiparametric Models. Berlin, Germany: Springer; 2004.

Rifkin, R, Yeo, G, Poggio, T. Regularized least‐squares classification. In: Suykens, JAK, Horvath, G, Basu, S, Micchelli, C, Vandewalle, J, eds. Advances in Learning Theory: Methods, Models and Applications, Volume 190 of *NATO Science Series III: Computer %26 Systems Sciences*. Amsterdam, the Netherlands: IOS Press; 2003, 131–154.

Quiñonero‐Candela, J, Sugiyama, M, Schwaighofer, A, Lawrence, N, eds. Dataset Shift in Machine Learning. Cambridge, MA: MIT Press; 2009.

Caruana, R, Pratt, L, Thrun, S. Multitask learning. Mach Learn 1997, 28:41–75.

Raykar, VC, Yu, S, Zhao, LH, Valadez, GH, Florin, C, Bogoni, L, Moy, L. Learning from crowds. J Mach Learn Res 2010, 11:1297–1322.

Raina R, Battle A, Lee H, Packer B, Ng A. Self‐taught learning: transfer learning from unlabeled data. In: Ghahramani Z, ed. *Proceedings of the 24th Annual International Conference on Machine Learning (ICML2007)*. Omnipress; 2007, 759–766.