Abstract
This chapter provides a general overview of research methods used in the analysis of both spoken and written discourse. In addition, it provides a specific overview of how natural language processing (NLP) tools that measure lexical, syntactic, rhetorical, and cohesion features of text can be used to examine spoken and written discourse. The chapter provides an overview of how NLP tools have been used in previous studies of discourse, an introduction to freely available tools, an overview of the output produced by these tools, and statistical methods used to analyze and interpret the output produced from these tools.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Ai, H., & Lu, X. (2013). A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing. In A. Díaz-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 249–264). Amsterdam: John Benjamins Publishing Company.
Allen, L. K., Mills, C., Jacovina, M. E., Crossley, S., D’Mello, S., & McNamara, D. S. (2016). Investigating boredom and engagement during writing using multiple sources of information: The essay, the writer, and keystrokes. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (pp. 114–123). Edinburgh: ACM.
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758/BF03193014
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Biber, D., Conrad, S. M., Reppen, R., Byrd, P., Helt, M., Clark, V., … Urzua, A. (2004). Representing language use in the University: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus. TOEFL Monograph Series. Retrieved from http://www.ets.org/Media/Research/pdf/RM-04-03.pdf
Biber, D., Gray, B., & Staples, S. (2014). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, amu059. https://doi.org/10.1093/applin/amu059
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol: O’Reilly Media, Inc.
BNC Consortium. (2007). The British National Corpus, version 3. BNC Consortium. Retrieved from http://www.natcorp.ox.ac.uk/
Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. https://doi.org/10.3758/BRM.41.4.977
Burstein, J. (2003). The E-rater® scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 113–121). Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Cambria, E., Havasi, C., & Hussain, A. (2012). SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis. In G. M. Youngblood & P. M. McCarthy (Eds.), FLAIRS conference (pp. 202–207). Palo Alto: Association for the Advancement of Artificial.
Cambria, E., Speer, R., Havasi, C., & Hussain, A. (2010). SenticNet: A Publicly Available Semantic Resource for Opinion Mining. In C. Havasi, D. Lenat, & B. Van Durme (Eds.), AAAI fall symposium: commonsense knowledge (Vol. 10).
Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science and Technology, 37(1), 51–89.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology, 33(4), 497–505.
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.
Crossley, S. A., Allen, D., & McNamara, D. S. (2012). Text simplification and comprehensible input: A case for an intuitive approach. Language Teaching Research, 16(1), 89–108.
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016a). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods, 1–19.
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016b). The tool for the automatic analysis of Text Cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 48(4), 1227–1237.
Crossley, S. A., Kyle, K., & Salsbury, T. (2016). A usage-based investigation of L2 lexical acquisition: The role of input and output. Modern Language Journal, 100(3), 702–715.
Crossley, S. A., Louwerse, M. M., McCarthy, P. M., & McNamara, D. S. (2007). A linguistic analysis of simplified and authentic texts. Modern Language Journal, 91(1), 15–30.
Crossley, S. A., & McNamara, D. S. (2008). Assessing L2 reading texts at the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy & McNamara (2007). Language Teaching, 41(3), 409–429.
Crossley, S. A., & McNamara, D. S. (2010). Cohesion, coherence, and expert evaluations of writing proficiency. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd annual conference of the Cognitive Science Society (pp. 984–989). Austin, TX: Cognitive Science Society.
Crossley, S. A., & McNamara, D. S. (2011). Text coherence and judgments of essay quality: Models of quality and coherence. In L. Carlson, C. Hoelscher, & T. F. Shipley (Eds.), Proceedings of the 29th Annual Conference of the Cognitive Science Society (pp. 1236–1241). Austin, TX: Cognitive Science Society.
Crossley, S. A., & McNamara, D. S. (2012). Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2), 115–135.
Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66–79. https://doi.org/10.1016/j.jslw.2014.09.006
Crossley, S. A., Paquette, L., Dascalu, M., McNamara, D. S., & Baker, R. S. (2016). Combining click-stream data with NLP tools to better understand MOOC completion. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (pp. 6–14). Edinburgh: ACM.
Crossley, S. A., Salsbury, T., & McNamara, D. (2009). Measuring L2 lexical growth using hypernymic relationships. Language Learning, 59(2), 307–334.
Crossley, S. A., Salsbury, T., & McNamara, D. (2010). The development of polysemy and frequency use in English second language speakers. Language Learning, 60(3), 573–605. https://doi.org/10.1111/j.1467-9922.2010.00568.x
Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second Language Acquisition, 24(2), 143–188. Retrieved from https://doi.org/10.1017/S0272263102002024
Fairclough, N. (2013). Critical discourse analysis: The critical study of language. New York, NY: Routledge.
Friginal, E. (2013). Twenty-five years of Biber’s Multi-Dimensional Analysis: introduction to the special issue and an interview with Douglas Biber. Corpora, 8(2), 137–152.
Friginal, E., & Weigle, S. (2014). Exploring multiple profiles of L2 writing using multi-dimensional analysis. Journal of Second Language Writing, 26, 80–95. https://doi.org/10.1016/j.jslw.2014.09.007
Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202. https://doi.org/10.3758/BF03195564
Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2), 123–145.
Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238.
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10. https://doi.org/10.1145/1656274.1656278
Higgins, D., Xi, X., Zechner, K., & Williamson, D. (2011). A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech & Language, 25(2), 282–306. https://doi.org/10.1016/j.csl.2010.06.001
Hunston, S., & Francis, G. (2000). Pattern grammar: A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins.
Hutto, C. J., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International AAAI Conference on Weblogs and Social Media.
Jung, Y., Crossley, S. A., & McNamara, D. S. (2015). Linguistic features in MELAB writing performances.
Jurafsky, D., & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics. Englewood Cliffs, NJ: Prentice-Hall.
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—ACL ’03 (Vol. 1, pp. 423–430). https://doi.org/10.3115/1075096.1075150
Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
Kyle, K. (2016). Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication. Georgia State University. Retrieved from http://scholarworks.gsu.edu/alesl_diss/35/
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757–786. https://doi.org/10.1002/tesq.194
Kyle, K., Crossley, S. A., & Berger, C. (in press). The tool for the analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods.
Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319–340. https://doi.org/10.1177/0265532215587391
Langacker, R. W. (1987). Foundations of cognitive grammar: Theoretical prerequisites (Vol. 1). Stanford: Stanford University Press.
Levy, R., & Andrew, G. (2006). Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In 5th International Conference on Language Resources and Evaluation (LREC 2006).
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. https://doi.org/10.1075/ijcl.15.4.02lu
Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36–62. Retrieved from http://www.jstor.org/stable/41307615
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In ACL (System Demonstrations) (pp. 55–60).
Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: the penn treebank. Computational Linguistics, 19(2), 313–330. Retrieved from http://dl.acm.org/citation.cfm?id=972470.972475
McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press.
Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90(5), 862.
Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.
Mohammad, S. M., & Turney, P. D. (2010). Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text (pp. 26–34). Association for Computational Linguistics.
Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436–465.
Myers, M. (2003). What can computers and AES contribute to a K–12 writing program. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 3–20). Mahwah, N.J.: Lawrence Erlbaum Associates Publishers.
Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5), 665–675.
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492–518.
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The development and psychometric properties of LIWC2007: LIWC.net.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahwah: Lawrence Erlbaum Associates, 71.
Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In Computing attitude and affect in text: Theory and applications (pp. 1–10). Netherlands: Springer.
Römer, U. (2005). Shifting foci in language description and instruction: Towards a lexical grammar of progressives. Arbeiten Aus Anglistik Und Amerikanistik, 30(1), 145–160.
Salsbury, T., Crossley, S. A., & McNamara, D. S. (2011). Psycholinguistic word information in second language oral discourse. Second Language Research, 27(3), 343–360.
Schiffrin, D. (1994). Approaches to discourse. Oxford, UK: Blackwell.
Secui, A., Sirbu, M.-D., Dascalu, M., Crossley, S., Ruseti, S., & Trausan-Matu, S. (2016). Expressing Sentiments in Game Reviews. In In International Conference on Artificial Intelligence: Methodology, Systems, and Applications (pp. 352–355). Varna, Bulgaria: Springer.
Sexton, J. B., & Helmreich, R. L. (2000). Analyzing cockpit communications: the links between language, performance, error, and workload. Journal of Human Performance in Extreme Environments, 5(1), 6.
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487–512.
Sinclair, J. M. (1987). Looking up: An account of the COBUILD project in lexical computing and the development of the Collins COBUILD English language dictionary. London: Collins ELT.
Tabachnick, B. G., & Fidell, L. S. (2014). Using Multivariate Statistics (4th ed.). Needham Heights, MA: Allyn & Bacon.
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.
Thorndike, E. L., & Lorge, I. (1944). The teacher’s wordbook of 30,000 words. New York: Columbia University, Teachers College. Bureau of Publications.
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology—NAACL ’03 (Vol. 1, pp. 173–180). Morristown, NJ, USA: Association for Computational Linguistics. https://doi.org/10.3115/1073445.1073478
Witten, I. H., & Frank, E. (2005). Data mining practical machine learning tools and techniques. Amsterdam; Boston, MA: Morgan Kaufman. Retrieved from http://public.eblib.com/choice/publicfullrecord.aspx?p=234978
Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy & Complexity. Honolulu, HI: University of Hawaii Press.
Yang, W., Lu, X., & Weigle, S. C. (2015). Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality. Journal of Second Language Writing, 28, 53–67. https://doi.org/10.1016/j.jslw.2015.02.002
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Copyright information
© 2018 The Author(s)
About this chapter
Cite this chapter
Crossley, S.A., Kyle, K. (2018). Analyzing Spoken and Written Discourse: A Role for Natural Language Processing Tools. In: Phakiti, A., De Costa, P., Plonsky, L., Starfield, S. (eds) The Palgrave Handbook of Applied Linguistics Research Methodology. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-137-59900-1_25
Download citation
DOI: https://doi.org/10.1057/978-1-137-59900-1_25
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-137-59899-8
Online ISBN: 978-1-137-59900-1
eBook Packages: Social SciencesSocial Sciences (R0)