http://www.aclweb.org/ (Accessed May 2, 2015).
Association for Computational Linguistics. What is computational linguistics? Available at: http://www.aclweb.org/archive/misc/what.html (Accessed February 2005).
Computational Linguistics. Wikipedia. Available at: http://en.wikipedia.org/wiki/Computational_Linguistics (Accessed November 17, 2010).
Johnson, M. How the statistical revolution changes (computational) linguistics. In: Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics, Athens, Greece, March, 2009, 3–11.
Hearst, MA. Untangling text data mining. In: Proceedings of ACL ’99: The 37th Annual Meeting of the Association for Computational Linguistics, University of Maryland, June, 1999.
Maitra, R. A statistical perspective on data mining. Available at: http://www.public.iastate.edu/∼maitra/papers/datamining.pdf
Jurafsky, D, Martin, JH. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. 2nd ed. Upper Saddle River, NJ: Prentice Hall; 2008.
Manning, CD, Schutze, H. Foundations of Statistical Natural Language Processing. Cambridge, MA: The MIT Press; 2000.
Krahmer, E. What computational linguists can learn from psychologists (and vice versa). Comput Linguist 2010, 36:285–294.
Martinez, AR. Natural language processing. WIREs Comput Stat 2010, 2:352–357.
Frege, G. Begriffsschrift, eine der arithmetischen nachgebildete Formelsprache des reinen Denkens. Halle a. S.: Louis Nebert; 1879. Translation: Concept Script, a formal language of pure thought modelled upon that of arithmetic, by S. Bauer‐Mengelberg in Jean Van Heijenoort, ed. From Frege to Gödel: A Source Book in Mathematical Logic, 1879–1931. Harvard University Press; 1967.
Partee, B. Lexical semantics and compositionality. In: Gleitman, L, Liberman, M, eds. Invitation to Cognitive Science, Part I: Language. Cambridge, MA: The MIT Press; 1995.
Mitchell, J. Composition in distributional models of semantics. PhD Dissertation, School of Informatics, University of Edinburgh, 2011.
Honeybone, P, Firth, JR. In: Chapman, S, Routlede, P, eds. Key Thinkers in Linguistics and the Philosophy of Language. Edinburgh: Edinburgh University Press; 2005, 80–86.
Harris, ZS. Distributional structure. Word 1954, 10:146–162.
Landauer, RK, Dumais, ST. A solution to Plato`s problem: the latent semantic analysis theory of acquisition, induction and representation of knowledge. Psychol Rev 1997, 104:211–240.
Salton, G, Wong, A, Yang, CS. A vector space model for automatic indexing. Commun ACM 1975, 18:613–620.
Salton, G. Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Reading, MA: Addison‐Wesley; 1989.
Landauer, TK, Foltz, PW, Lahan, D. An introduction to latent semantic analysis. In: Discourse Processes, vol. 25. Mahwah, NJ: Lawrence Erlbaum Associates, Inc; 1998, 259–284.
Coccare, N, Jurafsky, D. Toward better integration of semantic predictors in statistical language modeling. In: Proceedings of the 5th International Conference on Spoken Language Processing, Sydney, Australia, 1998, 2403–2406.
Bellegarda, JR. Exploiting latent semantic information in statistical language modeling. Proc IEEE 2000, 88:1276–1296.
Jessup, ER, Martin, JH. Taking a new look at the latent semantic analysis approach to information retrieval. In: Berry, MW, ed. Computational Information Retrieval. Philadelphia, PA: SIAM; 2000, 121–144.
Washtell, J. Compositional expectation: a purely distributional model of compositional semantics. In: Proceedings of the Ninth International Conference on Computational Semantics, Oxford, UK, 2011.
Griffiths, TL, Steyvers, MS, Tenenbaum, JB. Topics in semantic representation. Psychol Rev 2007, 114:211–244.
Hofmann, T. Probabilistic latent semantic indexing. In: Proceedings of the 22nd International SIGIT Conference on Research and Development in Information Retrieval, Berkeley, CA, 1999. Available at: http://cs.brown.edu/∼th/papers/Hofmann‐SIGIR99.pdf (Accessed December 2014).
Dempster, AP, Laird, NM, Rubin, DB. Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B 1977, 39:1–38.
Blei, DM, Ng, AY, Jordan, MI. Latent Dirichlet allocation. J Mach Learn Res 2003, 3:993–1022.
Steyvers, M, Griffiths, TL. Probabilistic topic models. In: Landauer, T, McNamara, D, Dennis, S, Kintsch, W, eds. Handbook of Latent Semantic Analysis. New York, NY: Psychology Press; 2007. Available at: http://psiexp.ss.uci.edu/research/papers/SteyversGriffithsLSABookFormatted.pdf.
Anthes, G. Topic models vs unstructured data. Commun ACM 2010, 53:16–18.
Ramage, D, Dumais, ST, Liebling, DJ. Characterizing microblogs with topic models. In: Proceedings of ICWSM, Washington, DC, 2010. Available at: http://nlp.stanford.edu/pubs/twitter‐icwsm10.pdf (Accessed December 2014).
Hopcroft, JE, Ullman, JD. Introduction to Automata Theory. Reading, MA: Addison‐Wesley Publishing Company; 1979.
Revesz, GE. Introduction to Formal Languages. New York: McGraw‐Hill Book Company; 1983.
Cole, R. Survey of the State of the Art in Human Language Technology. New York, NY: Cambridge University Press; 1997.
Lewis, HR, Papadimitriou, CH. Elements of the Theory of Computation. Upper Saddle River, NJ: Prentice Hall; 1981.
Charniak, E. Statistical Language Learning. Cambridge, MA: The MIT Press; 1996.
Berry, MW, Browne, M. Understanding Search Engines: Mathematical Modeling and Text Retrieval. Philadelphia, PA: SIAM; 1999.
Dumais, ST. Improving the retrieval of information from external sources. Behav Res Methods Instrum Comput 1991, 23:229–236.
Porter, MF. Algorithm for suffix stripping. Program 2006, 40:211–218.
Deerwester, S, Dumais, ST, Furnas, GW, Landauer, TK, Harshman, R. Indexing by latent semantic analysis. J Am Soc Inf Sci 1990, 41:391–407.
Hand, D, Mannila, H, Smyth, P. Principles of Data Mining. Cambridge, MA: The MIT Press; 2001.
Berry, MW, Drmac, Z, Jessup, ER. Matrices, vector spaces, and information retrieval. SIAM Rev 1999, 41:335–362.
Mosteller, F, Wallace, DL. Inference in an authorship problem: a comparative study of discrimination methods applied to the authorship of the disputed Federalist papers. J Am Stat Assoc 1963, 58:275–309.
Harish, BS, Guru, DS, Manjunath, S. Representation and classification of text documents: a brief review. Int J Comput Appl 2010, RTIPPR:110–119.
Duda, RO, Hart, PE. Pattern Classification and Scene Analysis. New York: John Wiley %26 Sons; 1973.
Hastie, T, Tibshirani, R. The Elements of Statistical Learning: Data Mining, Inference, and Prediction. 2nd ed. New York, NY: Springer‐Verlag; 2011.
Hotho, A, Nurenberger, A, Paab, G. A brief survey of text mining. LDV‐GLDB J Comput Linguist Lang Technol 2005, 20: 19–62.
Martinez, AR. A framework for the representation of semantics. PhD Thesis, George Mason University, Fairfax, VA, 2002.
Everitt, BS, Landau, S, Leese, M, Stahl, D. Cluster Analysis. New York: John Wiley %26 Sons; 2011.
Webb, A. Statistical Pattern Recognition. 2nd ed. Oxford: Oxford University Press; 2002.
Cover, TM, Hart, PE. Nearest neighbor pattern classification. IEEE Trans Inf Theory 1967, 13:21–27.
Breiman, L. Random Forests. Mach Learn 2001, 45:5–32.
Ripley, B. Pattern Recognition and Neural Networks. New York, NY: Cambridge University Press; 1996.
Bishop, CM. Pattern Recognition and Machine Learning. New York, NY: Springer; 2006.
Haykin, S. Neural Networks: A Comprehensive Foundation. New York: Macmillan College Publishing Company; 1994.
Baeza‐Yates, R, Ribeiro‐Neto, B. Modern Information Retrieval. New York: ACM Press; 1999.
Vapnik, VN. The Nature of Statistical Learning Theory. New York, NY: Springer‐Verlag; 1998.
Hearst, MA. Support vector machines. IEEE Intell Syst 1998, July/August:18–28. Available at: http://pages.cs.wisc.edu/∼jerryzhu/cs540/handouts/hearst98‐SVMtutorial.pdf (Accessed December 2014).
Duda, RO, Hart, PE, Stork, DG. Pattern Classification. 2nd ed. New York: John Wiley %26 Sons; 2001.
Fraley, C, Raftery, AE. How many clusters? Which clustering method? Answers via model‐based cluster analysis. Comput J 1998, 41:578–588.
Ng, AY, Jordan, MI, Weiss, Y. On spectral clustering: analysis and an algorithm. Adv Neural Inf Process Syst 2002, 14:849–856.
Hofmann, R. Probabilistic latent semantic indexing. In: Proceedings of the 22nd Annual ACM Conference on Research and Development in Information Retrieval, Berkeley, CA, 15–19 August, 1999, 50–57.
Xu, W, Liu, G, Gong, Y. Document clustering based on non‐negative matrix factorization. In: Proceedings of SIGIR`03, Toronto, Canada, 2003, 267–273.
Martinez, WL, Measure, A. Statistical analysis of text in survey records. In: Proceedings of the Federal Committee on Statistical Methodology Research Conference, FCSM, Washington, DC, 2013. Available at: https://fcsm.sites.usa.gov/files/2014/05/C3_Martinez_2013FCSM.pdf (Accessed December 2014).
http://ogesdw.dol.gov/views/data_catalogs.php
Tenenbaum, JB, de Silva, V, Langford, JC. A global geometric framework for nonlinear dimensionality reduction. Science 2000, 290:2319–2323.
Martinez, WL. Text analysis tools for editing and verification. In: UNECE Work Session on Statistical Data Editing, Paris, France, April, 2014. Available at: http://www.unece.org/fileadmin/DAM/stats/documents/ece/ces/ge.44/2014/mtg1/Topic_1_USA_Martinez.pdf (Accessed December 2014).
Kakaes, K. Google, Yahoo! BabelFish use math principles to translate documents online. The Washington Post, February 21, 2011. Available at: http://www.washingtonpost.com/wp‐dyn/content/article/2011/02/21/AR2011022102191.html
http://www.lrec‐conf.org/lrec2004/doc/jelinek.pdf
Hutchins, JW. Machine translation over fifty years. Histoire Epistemologie Lang 2001, 23:7–31.
Indurkhya, N, Damerau, FJ. Handbook of Natural Language Processing. 2nd ed. Boca Raton, FL: CRC Press; 2010.
Brown, PF, Pietra, VJD, Pietra, SAD, Mercer, RL. The mathematics of statistical machine translation: parameter estimation. Comput Linguist 1993, 19:263–311.
Martinez, AR. Part‐of‐speech tagging. WIREs Comput Stat 2012, 4:107–113.
Scmid, H. Probabilistic part‐of‐speech tagging using decision trees. In: International Conference on New Methods in Language Processing, Manchester, UK, 1994.
Ratnaparkhi, A. A maximum entropy model for part‐of‐speech tagging. In: Proceedings of the Conference on Empirical methods in Natural Language Processing (EMNLP), Philadelphia, PA, 1996.
Brants, R. TnT—a statistical part‐of‐speech tagger. In: Proceedings of the 6th Conference on Applied Natural Language Processing, Seattle, WA, 2000.
Gale, WA, Church, KW, Yarowsky, D. A method for disambiguating word senses in a corpus. Comput Humanit 1992, 26:415–439.
Schutze, H. Automatic word sense discrimination. Comput Linguist 1998, 24:91–124.
Pantel, P, Lin, D. Discovering word senses from text. In: Proceedings of ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Edmonton, Canada, 2002, 613–619.
Radev, DR, Hovy, E, McKeown, K. Introduction to the special issue on summarization. Comput Linguist 2002, 28:399–408.
Madnani, N, Dorr, BJ. Generating phrasal and sentential paraphrases: a survey of data‐driven methods. Comput Linguist 2010, 36:341–387.
Li, F, Han, C, Huang, M, Zhu, X, Xia, Y, Zhang, S, Yu, H. Structure‐aware review mining and summarization. In: Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 2010, 653–661. Available at: http://aclweb.org/anthology/C10‐1074 (Accessed December 2014).
Wong, K, Wu, M, Li, W. Extractive summarization using supervised and semi‐supervised learning. In: Proceedings of the 22nd International Conference on Computational Linguistics, Manchester, UK, 2008, 985–992. Available at: http://www.aclweb.org/anthology/C08‐1124 (Accessed December 2014).
Li, W, Wu, M, Lu, Q, Xu, W, Yuan, C. Extractive summarization using inter‐ and intra‐event relevance. In: Proceedings of the 21st International Conference on Computational Linguistics, Sydney, Australia, 2006, 369–376.
Page, L, Brin, S, Motwani, R, Winograd, T. The PageRank citation ranking: bring order to the web. Technical Report, Stanford University, 1998.
http://cran.r‐project.org/
http://cran.r‐project.org/web/views/NaturalLanguageProcessing.html
Feinerer, I, Hornik, K, Meyer, D. Text mining infrastructure in R. J Stat Softw 2008, 25:1–54. Available at: http://www.jstatsoft.org/v25/i05/ (Accessed December 2014).
https://www.python.org/ (Accessed December 2014).
http://www.nltk.org/
https://radimrehurek.com/gensim/
Bird, S, Klein, E, Loper, E. Natural Language Processing with Python. Sebastopol, CA: O`Reilly Media; 2009.
http://opennlp.apache.org/index.html
Abney, S. Semisupervised Learning for Computational Linguistics. Boca Raton, FL: Chapman %26 Hall/CRC; 2008.
Baayen, RH. Analyzing Linguistic Ata: A Practical Introduction to Statistics Using R. New York: Cambridge University Press; 2008.
Solka, J. Text data mining: theory and methods. Stat Surv 2008, 2:94–112. Available at: http://projecteuclid.org/euclid.ssu/1216238228 (Accessed December 2014).
Berry, MW, ed. Survey of Text Mining: Clustering, Classification, and Retrieval. New York: Springer‐Verlag; 2004.
https://www.ldc.upenn.edu/