1 Shannon, CE. A mathematical theory of communication. Bell Syst Tech J 1948, 27:379–423 and 623–656.
2 Brown, P, Della Pietra, S, Della Pietra, V, Lai, J, Mercer, R. An estimate of an upper bound for the entropy of English. Comput Linguist 1992, 18:31–40.
3 Dempster, AP, Laird, NM, Rubin, DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 1977, 39:1–38.
4 Chen, SF. Building probabilistic models for natural language. Doctoral dissertation. Harvard University; 1996.
5 Weaver, W. Translation. In: Locke, WN, Booth, AD, eds. Machine Translation of Languages: Fourteen Essays. Cambridge, MA: Technology Press of the Massachusetts Institute of Technology
6 Brown, P, Della Pietra, S, Della Pietra, V, Mercer, R. The mathematics of statistical machine translation: parameter estimation. Comput Linguist 1993, 19:263–312.
7 Berger, A, Della Pietra, S, Della Pietra, V. A maximum entropy approach to natural language processing. Comput Linguist 1996, 22:39–72.
8 Freund, Y, Schapire, RE. Large margin classification using the perceptron algorithm. Mach Learn 1999, 37:277–296.
9 Freund, Y, Schapire, RE. A decision‐theoretic generalization of on‐line learning and an application to boosting. J Comput Syst Sci 1997, 55:119–139.
10 Burges, CJC. A tutorial on support vector machines for pattern recognition. Data Mining Knowl Discov 1998, 2:121–167.
11 Vapnik, VN. Statistical Learning Theory. New York: John Wiley %26 Sons
12 Dietterich, TG, Bakiri, G. Solving multiclass learning problems via error‐correcting output codes. J Artif Intell Res 1995, 2:263–286.
13 Schölkopf, B, Smola, AJ. Learning with Kernels. Cambridge, MA: MIT Press
14 Collins, M, Duffy, N. %22New ranking algorithms for parsing and tagging: kernels over discrete structures, and the voted perceptron%22. Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics; 2002.
15 Yarowsky, D. %22Unsupervised word sense disambiguation rivaling supervised methods%22. Proceedings of the 33rd Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics
; 1995, 189–196.
16 Blum, A, Mitchell, T. %22Combining labeled and unlabeled data with co‐training%22. Proceedings of the 11th Annual Conference on Computational Learning Theory (COLT). San Francisco, CA: Morgan Kaufmann Publishers
; 1998, 92–100.
17 Zhu, X, Ghahramani, Z, Lafferty, J. Semi‐supervised learning using Gaussian fields and harmonic functions. In: Machine Learning: Proceedings of the 20th International Conference (ICML)
, Washington, DC: International Machine Learning Society; 2003.
18 Zipf, GK. Human Behaviour and the Principle of Least Effort. Reading MA: Addison Wesley
19 Newman, MEJ. Power laws, Pareto distributions and Zipf`s law. Contemp Phys 2005, 46:323–351.
20 Church, KW, Gale, WA. Poisson mixtures. Nat Lang Eng 1995, 1:163–190.
21 Deerwester, S, Dumais, ST, Furnas, GW, Landauer, TK, Harshman, R. Indexing by latent semantic indexing. J Am Soc Inf Sci 1990, 41:391–407.
22 Brent, MR. %22Automatic acquisition of subcategorization frames from untagged text%22. Proceedings of the 29th Annual Meeting of the Association for Computational Linguistics. Stroudsburg, PA: Association for Computational Linguistics; 1991, 209–214.
23 Resnik, P. Selection and information. Doctoral dissertation. University of Pennsylvania; 1993.
24 Hindle, D, Rooth, M. Structural ambiguity and lexical relations. Comput Linguist 1993, 18: 103–120.
25 Finch, SP. Finding structure in language. Doctoral dissertation. University of Edinburgh; 1993.
26 Chi, Z, Geman, S. Estimation of probabilistic context‐free grammars. Comput Linguist 1998, 24:299–305.
27 Lari, K, Young, SJ. The estimation of stochastic context‐free grammars using the Inside‐Outside algorithm. Comput Speech Lang 1990, 4:35–56.
28 Magerman, D. Natural language parsing as statistical pattern recognition. Doctoral dissertation. Stanford University; 1994.
29 Collins, M. Head‐driven statistical models for natural language parsing. Doctoral dissertation. University of Pennsylvania; 1999.
30 Charniak, E. %22A maximum‐entropy‐inspired parser%22. Proceedings of the Meeting of the North American Chapter of the Association for Computational Linguistics (NAACL). Stroudsburg, PA: Association for Computational Linguistics; 2000, 132–139.
31 Collins, M. Discriminative reranking for natural language parsing. Comput Linguist 2005, 31:25–70.
32 Abney, SP. Stochastic attribute‐value grammars. Comput Linguist 1997, 23:597–618.
33 Resnik, P. %22Probabilistic Tree‐Adjoining Grammar as a framework for statistical natural language processing%22. Proceedings of the International Conference on Computational Linguistics (COLING). Sheffield: International Committee on Computational Linguistics
; 1992, 418–424.
34 Freund, Y, Kearns, M, Ron, D, Rubinfeld, R, Schapire, RE, Sellie, L. %22Efficient learning of typical finite automata from random walks%22. Proceedings of the 25th ACM Symposium on the Theory of Computing. New York, NY: ACM Press
; 1993, 315–341.
35 Fu, KS, Booth, TL. Grammatical inference: introduction and survey. IEEE Trans Syst Man Cybern 1975, 5:59–72and 409–423.
36 Goldsmith, J. Unsupervised learning of the morphology of a natural language. Comput Linguist 2001, 27:153–198.
37 de Marcken, C. Unsupervised language acquisition. Doctoral dissertation. Massachusetts Institute of Technology; 1996.
38 Horning, JJ. A study of grammatical inference. Doctoral dissertation. Stanford University; 1969.
39 Stolcke, A, Omohundro, S. %22Inducing probabilistic grammars by Bayesian model merging%22. Grammatical Inference and Applications, Second International Colloquium on Grammatical Inference. Berlin: Springer Verlag
40 Wolff, JG. Language acquisition, data compression and generalization. Lang Commun 1982, 2:57–89.
41 Bechet, D. K‐valued link grammars are learnable from strings. In: Proceedings of the 8th Conference on Formal Grammar
. Formal Grammar Committee, Haifa; 2003.
42 Klein, D. The unsupervised learning of natural language structure. Doctoral dissertation. Stanford University; 2005.
43 Rissanen, J. Modeling by shortest data description. Automatica 1978, 14:465–471.