Hastie, T, Tibshirani, R, Friedman, JH. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Springer Series in Statistics. 2nd ed. New York: Springer; 2009.

Gordon, AD. Classification. Monographs on Statistics and Applied Probability. 2nd ed. Chapman %26 Hall; 1999.

Blum, A, Mitchell, T. Combining labeled and unlabeled data with co‐training. Proceedings of the 11th Annual Conference on Computational Learning Theory. 1998, 92–100.

Joachims, T. Transductive inference for text classification using support vector machines. Proceedings of the 16th International Conference on Machine Learning (ICML‐1999). 1999, 200–209.

Nigam, K, Mccallum, AK, Thrun, S, Mitchell, T. Text classification from labeled and unlabeled documents using EM. Mach Learn 2000, 39:103–134.

Basu, S, Bilenko, M, Mooney, R. A probabilistic framework for semi‐supervised clustering. Proceedings of the 10th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. ACM, 2004, 59–68.

Forgy, EW. Cluster analysis of multivariate data: efficiency versus interpretability of classifications. Biometrics 1965, 21:768–769.

MacQueen, J. Some methods for classification and analysis of multivariate observations. In: Le Cam, LM, Neyman, J, eds. Proceedings of the 5th Berkeley Symposium on Mathematical Statistics and Probability, vol. 1. Berkeley, CA: University of California Press; 1967, 281–297.

Hartigan, JA, Wong, MA. Algorithm AS 136: A k‐means clustering algorithm. J R Stat Soc [Ser C]: Appl Stat 1979, 28:100–108.

Lloyd, S. Least squares quantization in PCM. IEEE Trans Inf Theory 1982, 28:129–137.

Tibshirani, R, Walther, G, Hastie, T. Estimating the number of clusters in a data set via the gap statistic. J R Stat Soc [Ser B]: Stat Methodol 2001, 63:411–423. doi: 10.1111/1467‐9868.00293.

Milligan, G, Cooper, M. An examination of procedures for determining the number of clusters in a data set. Psychometrika 1985, 50:159–179. doi: 10.1007/BF02294245.

Sugar, CA, James, GM. Finding the number of clusters in a dataset. J Am Stat Assoc 2003, 98:750–763. doi: 10.1198/016214503000000666.

Tibshirani, R, Walther, G. Cluster validation by prediction strength. J Comput Graph Stat 2005, 14:511–528. doi: 10.1198/106186005X59243.

Eisen, MB, Spellman, PT, Brown, PO, Botstein, D. Cluster analysis and display of genome‐wide expression patterns. Proc Natl Acad Sci 1998, 95:14863–14868.

Basu, S, Banerjee, A, Mooney, R. Semi‐supervised clustering by seeding. Proceedings of the 19th International Conference on Machine Learning (ICML‐2002). 2002, 19–26.

Gaynor, S, Bair, E. Identification of biologically relevant subtypes via preweighted sparse clustering. ArXiv e‐prints 2013. arXiv:1304.3760. Available at: http://arxiv.org/abs/1304.3760.

Brown, MPS, Grundy, WN, Lin, D, Cristianini, N, Sugnet, CW, Furey, TS, Ares, M, Haussler, D. Knowledge‐based analysis of microarray gene expression data by using support vector machines. Proc Natl Acad Sci 2000, 97:262–267. doi: 10.1073/pnas.97.1.262.

Mateos, A, Dopazo, J, Jansen, R, Tu, Y, Gerstein, M, Stolovitzky, G. Systematic learning of gene functional classes from DNA array expression data by using multilayer perceptrons. Genome Res 2002, 12:1703–1715. doi: 10.1101/gr.192502.

Cheng, J, Cline, M, Martin, J, Finkelstein, D, Awad, T, Kulp, D, Siani‐Rose, MA. A knowledge‐based clustering algorithm driven by gene ontology. J Biopharm Stat 2004, 14:687–700. doi: 10.1081/BIP‐200025659.

Qu, Y, Xu, S. Supervised cluster analysis for microarray data based on multivariate Gaussian mixture. Bioinformatics 2004, 20:1905–1913. doi: 10.1093/bioinformatics/bth177.

Fang, Z, Yang, J, Li, Y, Luo, Q, Liu, L. Knowledge guided analysis of microarray data. J Biomed Inform 2006, 39:401–411.

Huang, D, Pan, W. Incorporating biological knowledge into distance‐based clustering analysis of microarray gene expression data. Bioinformatics 2006, 22:1259–1268. doi: 10.1093/bioinformatics/btl065.

Brameier, M, Wiuf, C. Co‐clustering and visualization of gene expression data and gene ontology terms for saccharomyces cerevisiae using self‐organizing maps. J Biomed Inform 2007, 40:160–173. doi: 10.1016/j.jbi.2006.05.001.

Chopra, P, Kang, J, Yang, J, Cho, H, Kim, H, Lee, MG. Microarray data mining using landmark gene‐guided clustering. BMC Bioinformatics 2008, 9:92. doi: 10.1186/1471‐2105‐9‐92.

Tari, L, Baral, C, Kim, S. Fuzzy c‐means clustering with prior biological knowledge. J Biomed Inform 2009, 42:74–81. doi: 10.1016/j.jbi.2008.05.009.

Basu, S, Davidson, I, Wagstaff, K. Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman %26 Hall/CRC Data Mining and Knowledge Discovery Series. Boca Raton, FL: CRC Press; 2009.

Wagstaff, K, Cardie, C, Rogers, S, Schrödl, S. Constrained k‐means clustering with background knowledge. Proceedings of the 18th International Conference on Machine Learning (ICML‐2001). 2001, 577–584.

Basu, S, Banerjee, A, Mooney, R. Active semi‐supervision for pairwise constrained clustering. Proceedings of the 4th SIAM International Conference on Data Mining (SDM‐2004). 2004, 333–344.

Bilenko, M, Basu, S, Mooney, R. Integrating constraints and metric learning in semi‐supervised clustering. Proceedings of the 21st International Conference on Machine learning (ICML‐2004). 2004, 81–88.

Klein, D, Kamvar, S, Manning, C. From instance‐level constraints to space‐level constraints: making the most of prior knowledge in data clustering. Proceedings of the 19th International Conference on Machine Learning (ICML‐2002). 2002, 307–314.

Bilenko, M, Mooney, R. Adaptive duplicate detection using learnable string similarity measures. Proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2003, 39–48.

Xing, E, Ng, A, Jordan, M, Russell, S. Distance metric learning, with application to clustering with side‐information. Adv Neural Inf Process Syst 2003, 15:505–512.

Bar‐Hillel, A, Hertz, T, Shental, N, Weinshall, D. Learning distance functions using equivalence relations. Proceedings of the 20th International Conference on Machine learning (ICML‐2003). 2003, 11–18.

Kamvar, S, Klein, D, Manning, C. Spectral learning. Proceedings of the 17th International Joint Conference of Artificial Intelligence. 2003, 561–566.

Chang, H, Yeung, DY. Locally linear metric adaptation for semi‐supervised clustering. Proceedings of the 21st International Conference on Machine learning (ICML‐2004). 2004, 153–160.

Lange, T, Law, M, Jain, A, Buhmann, J. Learning with constrained and unlabelled data. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR 2005). 2005, 731–738.

Handl, J, Knowles, J. On semi‐supervised clustering via multiobjective optimization. Proceedings of the 8th Annual Conference on Genetic and Evolutionary Computation (GECCO 2006). 2006, 1465–1472.

Li, T, Ding, C, Jordan, M. Solving consensus and semi‐supervised clustering problems using nonnegative matrix factorization. Proceedings of the 7th IEEE International Conference on Data Mining (ICDM 2007). 2007, 577–582.

Xiang, S, Nie, F, Zhang, C. Learning a Mahalanobis distance metric for data clustering and classification. Pattern Recogn 2008, 41:3600–3612. doi: 10.1016/j.patcog.2008.05.018.

Wang, F, Li, T, Zhang, C. Semi‐supervised clustering via matrix factorization. Proceedings of the 8th SIAM International Conference on Data Mining (SDM‐2008). 2008, 1–12.

Cohn, D, Caruana, R, McCallum, A. Semi‐supervised clustering with user feedback. In: Basu, S, Davidson, I, Wagstaff, K, eds. Constrained Clustering: Advances in Algorithms, Theory, and Applications. Chapman %26 Hall/CRC Data Mining and Knowledge Discovery Series chapter 2. Boca Raton, FL: CRC Press; 2009, 17–31.

Yin, X, Chen, S, Hu, E, Zhang, D. Semi‐supervised clustering with metric learning: an adaptive kernel method. Pattern Recogn 2010, 43:1320–1333. doi: 10.1016/j.patcog.2009.11.005.

Davidson, I, Ravi, S. Clustering with constraints: Feasibility issues and the k‐means algorithm. Proceedings of the 5th SIAM International Conference on Data Mining (SDM‐2005). 2005a, 138–149.

Davidson, I, Ravi, S. Agglomerative hierarchical clustering with constraints: theoretical and empirical results. Knowledge Discovery in Databases (KDD 2005). 2005b, 59–70.

Law, M, Topchy, A, Jain, A. Model‐based clustering with probabilistic constraints. Proceedings of the 5th SIAM International Conference on Data Mining (SDM‐2005). 2005, 641–645.

Lu, Z, Leen, T. Semi‐supervised learning with penalized probabilistic clustering. Adv Neural Inf Process Syst 2005, 17:849–856.

Tang, W, Xiong, H, Zhong, S, Wu, J. Enhancing semi‐supervised clustering: a feature projection perspective. Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2007, 707–716.

Kulis, B, Basu, S, Dhillon, I, Mooney, R. Semi‐supervised graph clustering: a kernel approach. Mach Learn 2009, 74:1–22. doi: 10.1007/s10994‐008‐5084‐4.

Yoshida, T, Okatani, K. A graph‐based projection approach for semi‐supervised clustering. In: Kang, BH, Richards, D, eds. Knowledge Management and Acquisition for Smart Systems and Services. Lecture Notes in Computer Science. Berlin, Germany: Springer‐Verlag; 2010, 1–13.

Greene, D, Cunningham, P. Constraint selection by committee: an ensemble approach to identifying informative constraints for semi‐supervised clustering. Proceedings of the 18th European Conf. on Machine Learning (ECML 2007). 2007, 140–151.

Mallapragada, P, Jin, R, Jain, A. Active query selection for semi‐supervised clustering. 19th International Conference on Pattern Recognition (ICPR 2008). IEEE, 2008, 1–4.

Zheng, L, Li, T. Semi‐supervised hierarchical clustering. Proceedings of the 11th IEEE International Conference on Data Mining (ICDM 2011). 2011, 982–991. doi:10.1109/ICDM.2011.130.

Miyamoto, S, Terami, A. Semi‐supervised agglomerative hierarchical clustering algorithms with pairwise constraints. Proceedings of the 2010 IEEE International Conference on Fuzzy Systems (FUZZ 2010). 2010, 1–6. doi:10.1109/FUZZY.2010.5584625.

Davidson, I, Ravi, S. Using instance‐level constraints in agglomerative hierarchical clustering: theoretical and empirical results. Data Mining and Knowledge Discovery 2009, 18:257–282. doi: 10.1007/s10618‐008‐0103‐4.

Miyamoto, S, Terami, A. Constrained agglomerative hierarchical clustering algorithms with penalties. In Proceedings of the 2011 IEEE International Conference on Fuzzy Systems (FUZZ 2011). 2011, 422–427. doi:10.1109/FUZZY.2011.6007351.

Bade, K, Nurnberger, A. Personalized hierarchical clustering. Proceedings of the 2006 IEEE/WIC/ACM International Conference on Web Intelligence (WI 2006). 2006, 181–187. doi:10.1109/WI.2006.131.

Zhao, H, Qi, Z. Hierarchical agglomerative clustering with ordering constraints. Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining (WKDD 2010). 2010, 195–199. doi:10.1109/WKDD.2010.123.

Hamasuna, Y, Endo, Y, Miyamoto, S. Semi‐supervised agglomerative hierarchical clustering using clusterwise tolerance based pairwise constraints. Proceedings of the 7th International Conference on Modeling Decision for Artificial Intelligence (MDAI 2010). 2010, 152–162.

Hamasuna, Y, Endo, Y, Miyamoto, S. Semi‐supervised agglomerative hierarchical clustering with ward method using clusterwise tolerance. Proceedings of the 8th International Conference on Modeling Decision for Artificial Intelligence (MDAI 2011). 2011, 103–113.

Hamasuna, Y, Endo, Y, Miyamoto, S. On agglomerative hierarchical clustering using clusterwise tolerance based pairwise constraints. JACIII 2012, 16:174–179.

Bair, E, Tibshirani, R. Semi‐supervised methods to predict patient survival from gene expression data. PLoS Biol 2004, 2:e108. doi: 10.1371/journal.pbio.0020108.

Nowak, G, Tibshirani, R. Complementary hierarchical clustering. Biostatistics 2008, 9:467–483. doi: 10.1093/biostatistics/kxm046.

Bullinger, L, Döhner, K, Bair, E, Fröhling, S, Schlenk, R, Tibshirani, R, Döhner, H, Pollack, JR. Gene expression profiling identifies new subclasses and improves outcome prediction in adult myeloid leukemia. N Engl J Med 2004, 350:1605–1616.

Koestler, DC, Marsit, CJ, Christensen, BC, Karagas, MR, Bueno, R, Sugarbaker, DJ, Kelsey, KT, Houseman, EA. Semi‐supervised recursively partitioned mixture models for identifying cancer subtypes. Bioinformatics 2010, 26:2578–2585. doi: 10.1093/bioinformatics/btq470.

Houseman, EA, Christensen, B, Yeh, RF, Marsit, C, Karagas, M, Wrensch, M, Nelson, H, Wiemels, J, Zheng, S, Wiencke, J, et al. Model‐based clustering of DNA methylation array data: a recursive‐partitioning algorithm for high‐dimensional data arising as a mixture of beta distributions. BMC Bioinformatics 2008, 9:365. doi: 10.1186/1471‐2105‐9‐365.

Witten, DM, Tibshirani, R. A framework for feature selection in clustering. J Am Stat Assoc 2010, 105:713–726. doi: 10.1198/jasa.2010.tm09415.

Tibshirani, R. Regression shrinkage and selection via the lasso. J R Stat Soc [Ser B] 1996, 58:267–288.

Ghosh, D, Chinnaiyan, AM. Mixture modelling of gene expression data from microarray experiments. Bioinformatics 2002, 18:275–286. doi: 10.1093/bioinformatics/18.2.275.

Liu, JS, Zhang, JL, Palumbo, MJ, Lawrence, CE. Bayesian clustering with variable and transformation selections. Bayesian Statistics 7: Proceedings of the Seventh Valencia International Meeting. 2003, 249–275.