Shannon, CE. A mathematical theory of communication. Bell Syst Tech J 1948, 27:379–423 and 623–656.
Brown, P, Della Pietra, S, Della Pietra, V, Lai, J, Mercer, R. An estimate of an upper bound for the entropy of English. Comput Linguist 1992, 18:31–40.
Dempster, AP, Laird, NM, Rubin, DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 1977, 39:1–38.
Chen, SF. Building probabilistic models for natural language. Doctoral dissertation. Harvard University; 1996.
Weaver, W. Translation. In: Locke, WN, Booth, AD, eds. Machine Translation of Languages: Fourteen Essays. Cambridge, MA: Technology Press of the Massachusetts Institute of Technology; 1955.
Brown, P, Della Pietra, S, Della Pietra, V, Mercer, R. The mathematics of statistical machine translation: parameter estimation. Comput Linguist 1993, 19:263–312.
Berger, A, Della Pietra, S, Della Pietra, V. A maximum entropy approach to natural language processing. Comput Linguist 1996, 22:39–72.
Freund, Y, Schapire, RE. Large margin classification using the perceptron algorithm. Mach Learn 1999, 37:277–296.
Freund, Y, Schapire, RE. A decision‐theoretic generalization of on‐line learning and an application to boosting. J Comput Syst Sci 1997, 55:119–139.
Burges, CJC. A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov 1998, 2:121–167.
Vapnik, VN. Statistical Learning Theory. New York: John Wiley %26 Sons; 1998.
Dietterich, TG, Bakiri, G. Solving multiclass learning problems via error‐correcting output codes. J Artif Intell Res 1995, 2:263–286.
Schölkopf, B, Smola, AJ. Learning with Kernels. Cambridge, MA: MIT Press; 2002.
Collins, M, Duffy, N. %22New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron%22. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics; 2002.
Yarowsky, D. %22Unsupervised word sense disambiguation rivaling supervised methods%22. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics; 1995, 189–196.
Blum, A, Mitchell, T. %22Combining labeled and unlabeled data with co‐training%22. Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT). San Francisco, CA: Morgan Kaufmann Publishers; 1998, 92–100.
Zhu, X, Ghahramani, Z, Lafferty, J. Semi‐supervised learning using Gaussian fields and harmonic functions. In: Machine Learning: Proceedings of the 20th International Conference (ICML), Washington, DC: International Machine Learning Society; 2003.
Zipf, GK. Human Behaviour and the Principle of Least Effort. Reading MA: Addison Wesley; 1949.
Newman, MEJ. Power laws, Pareto distributions and Zipf`s law. Contemp Phys 2005, 46:323–351.
Church, KW, Gale, WA. Poisson mixtures. Nat Lang Eng 1995, 1:163–190.
Deerwester, S, Dumais, ST, Furnas, GW, Landauer, TK, Harshman, R. Indexing by latent semantic indexing. J Am Soc Inf Sci 1990, 41:391–407.
Brent, MR. %22Automatic acquisition of subcategorization frames from untagged text%22. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics; 1991, 209–214.
Resnik, P. Selection and information. Doctoral dissertation. University of Pennsylvania; 1993.
Hindle, D, Rooth, M. Structural ambiguity and lexical relations. Comput Linguist 1993, 18: 103–120.
Finch, SP. Finding structure in language. Doctoral dissertation. University of Edinburgh; 1993.
Chi, Z, Geman, S. Estimation of probabilistic context‐free grammars. Comput Linguist 1998, 24:299–305.
Lari, K, Young, SJ. The estimation of stochastic context‐free grammars using the Inside‐Outside algorithm. Comput Speech Lang 1990, 4:35–56.
Magerman, D. Natural language parsing as statistical pattern recognition. Doctoral dissertation. Stanford University; 1994.
Collins, M. Head‐driven statistical models for natural language parsing. Doctoral dissertation. University of Pennsylvania; 1999.
Charniak, E. %22A maximum‐entropy‐inspired parser%22. Proceedings of the Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL). Stroudsburg, PA: Association for Computational Linguistics; 2000, 132–139.
Collins, M. Discriminative reranking for natural language parsing. Comput Linguist 2005, 31:25–70.
Abney, SP. Stochastic attribute‐value grammars. Comput Linguist 1997, 23:597–618.
Resnik, P. %22Probabilistic Tree‐Adjoining Grammar as a framework for statistical natural language processing%22. Proceedings of the International Conference on Computational Linguistics (COLING). Sheffield: International Committee on Computational Linguistics; 1992, 418–424.
Freund, Y, Kearns, M, Ron, D, Rubinfeld, R, Schapire, RE, Sellie, L. %22Efficient learning of typical finite automata from random walks%22. Proceedings of the 25th ACM Symposium on the Theory of Computing. New York, NY: ACM Press; 1993, 315–341.
Fu, KS, Booth, TL. Grammatical inference: introduction and survey. IEEE Trans Syst Man Cybern 1975, 5:59–72and 409–423.
Goldsmith, J. Unsupervised learning of the morphology of a natural language. Comput Linguist 2001, 27:153–198.
de Marcken, C. Unsupervised language acquisition. Doctoral dissertation. Massachusetts Institute of Technology; 1996.
Horning, JJ. A study of grammatical inference. Doctoral dissertation. Stanford University; 1969.
Stolcke, A, Omohundro, S. %22Inducing probabilistic grammars by Bayesian model merging%22. Grammatical Inference and Applications, Second International Colloquium on Grammatical Inference. Berlin: Springer Verlag; 1994.
Wolff, JG. Language acquisition, data compression and generalization. Lang Commun 1982, 2:57–89.
Bechet, D. K‐valued link grammars are learnable from strings. In: Proceedings of the 8th Conference on Formal Grammar. Formal Grammar Committee, Haifa; 2003.
Klein, D. The unsupervised learning of natural language structure. Doctoral dissertation. Stanford University; 2005.
Rissanen, J. Modeling by shortest data description. Automatica 1978, 14:465–471.