Analyzing Spoken and Written Discourse: A Role for Natural Language Processing Tools

Crossley, Scott A.; Kyle, Kristopher

doi:10.1057/978-1-137-59900-1_25

Scott A. Crossley⁵ &
Kristopher Kyle⁶

6867 Accesses
5 Citations
1 Altmetric

Abstract

This chapter provides a general overview of research methods used in the analysis of both spoken and written discourse. In addition, it provides a specific overview of how natural language processing (NLP) tools that measure lexical, syntactic, rhetorical, and cohesion features of text can be used to examine spoken and written discourse. The chapter provides an overview of how NLP tools have been used in previous studies of discourse, an introduction to freely available tools, an overview of the output produced by these tools, and statistical methods used to analyze and interpret the output produced from these tools.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 299.00; Price excludes VAT (USA)

Softcover Book: USD 379.99; Price excludes VAT (USA)

Hardcover Book: USD 379.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

Ai, H., & Lu, X. (2013). A corpus-based comparison of syntactic complexity in NNS and NS university students’ writing. In A. Díaz-Negrillo, N. Ballier, & P. Thompson (Eds.), Automatic treatment and analysis of learner corpus data (pp. 249–264). Amsterdam: John Benjamins Publishing Company.
Chapter Google Scholar
Allen, L. K., Mills, C., Jacovina, M. E., Crossley, S., D’Mello, S., & McNamara, D. S. (2016). Investigating boredom and engagement during writing using multiple sources of information: The essay, the writer, and keystrokes. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (pp. 114–123). Edinburgh: ACM.
Google Scholar
Balota, D. A., Yap, M. J., Hutchison, K. A., Cortese, M. J., Kessler, B., Loftis, B., … Treiman, R. (2007). The English lexicon project. Behavior Research Methods, 39(3), 445–459. https://doi.org/10.3758/BF03193014
Article Google Scholar
Biber, D. (1988). Variation across speech and writing. Cambridge: Cambridge University Press.
Book Google Scholar
Biber, D., Conrad, S. M., Reppen, R., Byrd, P., Helt, M., Clark, V., … Urzua, A. (2004). Representing language use in the University: Analysis of the TOEFL 2000 Spoken and Written Academic Language Corpus. TOEFL Monograph Series. Retrieved from http://www.ets.org/Media/Research/pdf/RM-04-03.pdf
Biber, D., Gray, B., & Staples, S. (2014). Predicting patterns of grammatical complexity across language exam task types and proficiency levels. Applied Linguistics, amu059. https://doi.org/10.1093/applin/amu059
Article Google Scholar
Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python. Sebastopol: O’Reilly Media, Inc.
Google Scholar
BNC Consortium. (2007). The British National Corpus, version 3. BNC Consortium. Retrieved from http://www.natcorp.ox.ac.uk/
Brysbaert, M., & New, B. (2009). Moving beyond Kucera and Francis: A critical evaluation of current word frequency norms and the introduction of a new and improved word frequency measure for American English. Behavior Research Methods, 41(4), 977–990. https://doi.org/10.3758/BRM.41.4.977
Article Google Scholar
Burstein, J. (2003). The E-rater® scoring engine: Automated essay scoring with natural language processing. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 113–121). Mahwah, NJ: Lawrence Erlbaum Associates Publishers.
Google Scholar
Cambria, E., Havasi, C., & Hussain, A. (2012). SenticNet 2: A Semantic and Affective Resource for Opinion Mining and Sentiment Analysis. In G. M. Youngblood & P. M. McCarthy (Eds.), FLAIRS conference (pp. 202–207). Palo Alto: Association for the Advancement of Artificial.
Google Scholar
Cambria, E., Speer, R., Havasi, C., & Hussain, A. (2010). SenticNet: A Publicly Available Semantic Resource for Opinion Mining. In C. Havasi, D. Lenat, & B. Van Durme (Eds.), AAAI fall symposium: commonsense knowledge (Vol. 10).
Google Scholar
Chowdhury, G. G. (2003). Natural language processing. Annual Review of Information Science and Technology, 37(1), 51–89.
Article Google Scholar
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Google Scholar
Coltheart, M. (1981). The MRC psycholinguistic database. The Quarterly Journal of Experimental Psychology, 33(4), 497–505.
Article Google Scholar
Coxhead, A. (2000). A new academic word list. TESOL Quarterly, 34(2), 213–238.
Article Google Scholar
Crossley, S. A., Allen, D., & McNamara, D. S. (2012). Text simplification and comprehensible input: A case for an intuitive approach. Language Teaching Research, 16(1), 89–108.
Article Google Scholar
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016a). Sentiment analysis and social cognition engine (SEANCE): An automatic tool for sentiment, social cognition, and social-order analysis. Behavior Research Methods, 1–19.
Google Scholar
Crossley, S. A., Kyle, K., & McNamara, D. S. (2016b). The tool for the automatic analysis of Text Cohesion (TAACO): Automatic assessment of local, global, and text cohesion. Behavior Research Methods, 48(4), 1227–1237.
Article Google Scholar
Crossley, S. A., Kyle, K., & Salsbury, T. (2016). A usage-based investigation of L2 lexical acquisition: The role of input and output. Modern Language Journal, 100(3), 702–715.
Article Google Scholar
Crossley, S. A., Louwerse, M. M., McCarthy, P. M., & McNamara, D. S. (2007). A linguistic analysis of simplified and authentic texts. Modern Language Journal, 91(1), 15–30.
Article Google Scholar
Crossley, S. A., & McNamara, D. S. (2008). Assessing L2 reading texts at the intermediate level: An approximate replication of Crossley, Louwerse, McCarthy & McNamara (2007). Language Teaching, 41(3), 409–429.
Article Google Scholar
Crossley, S. A., & McNamara, D. S. (2010). Cohesion, coherence, and expert evaluations of writing proficiency. In S. Ohlsson & R. Catrambone (Eds.), Proceedings of the 32nd annual conference of the Cognitive Science Society (pp. 984–989). Austin, TX: Cognitive Science Society.
Google Scholar
Crossley, S. A., & McNamara, D. S. (2011). Text coherence and judgments of essay quality: Models of quality and coherence. In L. Carlson, C. Hoelscher, & T. F. Shipley (Eds.), Proceedings of the 29th Annual Conference of the Cognitive Science Society (pp. 1236–1241). Austin, TX: Cognitive Science Society.
Google Scholar
Crossley, S. A., & McNamara, D. S. (2012). Predicting second language writing proficiency: the roles of cohesion and linguistic sophistication. Journal of Research in Reading, 35(2), 115–135.
Article Google Scholar
Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66–79. https://doi.org/10.1016/j.jslw.2014.09.006
Article Google Scholar
Crossley, S. A., Paquette, L., Dascalu, M., McNamara, D. S., & Baker, R. S. (2016). Combining click-stream data with NLP tools to better understand MOOC completion. In Proceedings of the Sixth International Conference on Learning Analytics & Knowledge (pp. 6–14). Edinburgh: ACM.
Chapter Google Scholar
Crossley, S. A., Salsbury, T., & McNamara, D. (2009). Measuring L2 lexical growth using hypernymic relationships. Language Learning, 59(2), 307–334.
Article Google Scholar
Crossley, S. A., Salsbury, T., & McNamara, D. (2010). The development of polysemy and frequency use in English second language speakers. Language Learning, 60(3), 573–605. https://doi.org/10.1111/j.1467-9922.2010.00568.x
Article Google Scholar
Ellis, N. C. (2002). Frequency effects in language processing. Studies in Second Language Acquisition, 24(2), 143–188. Retrieved from https://doi.org/10.1017/S0272263102002024
Fairclough, N. (2013). Critical discourse analysis: The critical study of language. New York, NY: Routledge.
Book Google Scholar
Friginal, E. (2013). Twenty-five years of Biber’s Multi-Dimensional Analysis: introduction to the special issue and an interview with Douglas Biber. Corpora, 8(2), 137–152.
Article Google Scholar
Friginal, E., & Weigle, S. (2014). Exploring multiple profiles of L2 writing using multi-dimensional analysis. Journal of Second Language Writing, 26, 80–95. https://doi.org/10.1016/j.jslw.2014.09.007
Article Google Scholar
Goldberg, A. E. (1995). Constructions: A construction grammar approach to argument structure. Chicago: University of Chicago Press.
Google Scholar
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavior Research Methods, Instruments, & Computers, 36(2), 193–202. https://doi.org/10.3758/BF03195564
Article Google Scholar
Grant, L., & Ginther, A. (2000). Using computer-tagged linguistic features to describe L2 writing differences. Journal of Second Language Writing, 9(2), 123–145.
Article Google Scholar
Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238.
Article Google Scholar
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., & Witten, I. H. (2009). The WEKA data mining software. ACM SIGKDD Explorations Newsletter, 11(1), 10. https://doi.org/10.1145/1656274.1656278
Article Google Scholar
Higgins, D., Xi, X., Zechner, K., & Williamson, D. (2011). A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech & Language, 25(2), 282–306. https://doi.org/10.1016/j.csl.2010.06.001
Article Google Scholar
Hunston, S., & Francis, G. (2000). Pattern grammar: A corpus-driven approach to the lexical grammar of English. Amsterdam: John Benjamins.
Book Google Scholar
Hutto, C. J., & Gilbert, E. (2014). Vader: A parsimonious rule-based model for sentiment analysis of social media text. In Eighth International AAAI Conference on Weblogs and Social Media.
Google Scholar
Jung, Y., Crossley, S. A., & McNamara, D. S. (2015). Linguistic features in MELAB writing performances.
Google Scholar
Jurafsky, D., & Martin, J. H. (2008). Speech and language processing: An introduction to natural language processing, speech recognition, and computational linguistics. Englewood Cliffs, NJ: Prentice-Hall.
Google Scholar
Klein, D., & Manning, C. D. (2003). Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics—ACL ’03 (Vol. 1, pp. 423–430). https://doi.org/10.3115/1075096.1075150
Kucera, H., & Francis, W. N. (1967). Computational analysis of present-day American English. Providence, RI: Brown University Press.
Google Scholar
Kyle, K. (2016). Measuring Syntactic Development in L2 Writing: Fine Grained Indices of Syntactic Complexity and Usage-Based Indices of Syntactic Sophistication. Georgia State University. Retrieved from http://scholarworks.gsu.edu/alesl_diss/35/
Kyle, K., & Crossley, S. A. (2015). Automatically assessing lexical sophistication: Indices, tools, findings, and application. TESOL Quarterly, 49(4), 757–786. https://doi.org/10.1002/tesq.194
Article Google Scholar
Kyle, K., Crossley, S. A., & Berger, C. (in press). The tool for the analysis of lexical sophistication (TAALES): Version 2.0. Behavior Research Methods.
Google Scholar
Kyle, K., Crossley, S. A., & McNamara, D. S. (2016). Construct validity in TOEFL iBT speaking tasks: Insights from natural language processing. Language Testing, 33(3), 319–340. https://doi.org/10.1177/0265532215587391
Article Google Scholar
Langacker, R. W. (1987). Foundations of cognitive grammar: Theoretical prerequisites (Vol. 1). Stanford: Stanford University Press.
Google Scholar
Levy, R., & Andrew, G. (2006). Tregex and Tsurgeon: tools for querying and manipulating tree data structures. In 5th International Conference on Language Resources and Evaluation (LREC 2006).
Google Scholar
Lu, X. (2010). Automatic analysis of syntactic complexity in second language writing. International Journal of Corpus Linguistics, 15(4), 474–496. https://doi.org/10.1075/ijcl.15.4.02lu
Article Google Scholar
Lu, X. (2011). A corpus-based evaluation of syntactic complexity measures as indices of college-level ESL writers’ language development. TESOL Quarterly, 45(1), 36–62. Retrieved from http://www.jstor.org/stable/41307615
Article Google Scholar
Manning, C. D., Surdeanu, M., Bauer, J., Finkel, J. R., Bethard, S., & McClosky, D. (2014). The Stanford CoreNLP Natural Language Processing Toolkit. In ACL (System Demonstrations) (pp. 55–60).
Google Scholar
Marcus, M. P., Marcinkiewicz, M. A., & Santorini, B. (1993). Building a large annotated corpus of English: the penn treebank. Computational Linguistics, 19(2), 313–330. Retrieved from http://dl.acm.org/citation.cfm?id=972470.972475
McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press.
Book Google Scholar
Mehl, M. R., Gosling, S. D., & Pennebaker, J. W. (2006). Personality in its natural habitat: Manifestations and implicit folk theories of personality in daily life. Journal of Personality and Social Psychology, 90(5), 862.
Article Google Scholar
Miller, G. A. (1995). WordNet: a lexical database for English. Communications of the ACM, 38(11), 39–41.
Article Google Scholar
Mohammad, S. M., & Turney, P. D. (2010). Emotions evoked by common words and phrases: Using Mechanical Turk to create an emotion lexicon. In Proceedings of the NAACL HLT 2010 workshop on computational approaches to analysis and generation of emotion in text (pp. 26–34). Association for Computational Linguistics.
Google Scholar
Mohammad, S. M., & Turney, P. D. (2013). Crowdsourcing a word–emotion association lexicon. Computational Intelligence, 29(3), 436–465.
Article Google Scholar
Myers, M. (2003). What can computers and AES contribute to a K–12 writing program. In M. D. Shermis & J. Burstein (Eds.), Automated essay scoring: A cross-disciplinary perspective (pp. 3–20). Mahwah, N.J.: Lawrence Erlbaum Associates Publishers.
Google Scholar
Newman, M. L., Pennebaker, J. W., Berry, D. S., & Richards, J. M. (2003). Lying words: Predicting deception from linguistic styles. Personality and Social Psychology Bulletin, 29(5), 665–675.
Article Google Scholar
Ortega, L. (2003). Syntactic complexity measures and their relationship to L2 proficiency: A research synthesis of college-level L2 writing. Applied Linguistics, 24(4), 492–518.
Article Google Scholar
Pennebaker, J. W., Chung, C. K., Ireland, M., Gonzales, A., & Booth, R. J. (2007). The development and psychometric properties of LIWC2007: LIWC.net.
Pennebaker, J. W., Francis, M. E., & Booth, R. J. (2001). Linguistic inquiry and word count: LIWC 2001. Mahwah: Lawrence Erlbaum Associates, 71.
Google Scholar
Polanyi, L., & Zaenen, A. (2006). Contextual valence shifters. In Computing attitude and affect in text: Theory and applications (pp. 1–10). Netherlands: Springer.
Book Google Scholar
Römer, U. (2005). Shifting foci in language description and instruction: Towards a lexical grammar of progressives. Arbeiten Aus Anglistik Und Amerikanistik, 30(1), 145–160.
Google Scholar
Salsbury, T., Crossley, S. A., & McNamara, D. S. (2011). Psycholinguistic word information in second language oral discourse. Second Language Research, 27(3), 343–360.
Article Google Scholar
Schiffrin, D. (1994). Approaches to discourse. Oxford, UK: Blackwell.
Google Scholar
Secui, A., Sirbu, M.-D., Dascalu, M., Crossley, S., Ruseti, S., & Trausan-Matu, S. (2016). Expressing Sentiments in Game Reviews. In In International Conference on Artificial Intelligence: Methodology, Systems, and Applications (pp. 352–355). Varna, Bulgaria: Springer.
Chapter Google Scholar
Sexton, J. B., & Helmreich, R. L. (2000). Analyzing cockpit communications: the links between language, performance, error, and workload. Journal of Human Performance in Extreme Environments, 5(1), 6.
Article Google Scholar
Simpson-Vlach, R., & Ellis, N. C. (2010). An academic formulas list: New methods in phraseology research. Applied Linguistics, 31(4), 487–512.
Article Google Scholar
Sinclair, J. M. (1987). Looking up: An account of the COBUILD project in lexical computing and the development of the Collins COBUILD English language dictionary. London: Collins ELT.
Google Scholar
Tabachnick, B. G., & Fidell, L. S. (2014). Using Multivariate Statistics (4th ed.). Needham Heights, MA: Allyn & Bacon.
Google Scholar
Tausczik, Y. R., & Pennebaker, J. W. (2010). The psychological meaning of words: LIWC and computerized text analysis methods. Journal of Language and Social Psychology, 29(1), 24–54.
Article Google Scholar
Thorndike, E. L., & Lorge, I. (1944). The teacher’s wordbook of 30,000 words. New York: Columbia University, Teachers College. Bureau of Publications.
Google Scholar
Toutanova, K., Klein, D., Manning, C. D., & Singer, Y. (2003). Feature-rich part-of-speech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology—NAACL ’03 (Vol. 1, pp. 173–180). Morristown, NJ, USA: Association for Computational Linguistics. https://doi.org/10.3115/1073445.1073478
Witten, I. H., & Frank, E. (2005). Data mining practical machine learning tools and techniques. Amsterdam; Boston, MA: Morgan Kaufman. Retrieved from http://public.eblib.com/choice/publicfullrecord.aspx?p=234978
Wolfe-Quintero, K., Inagaki, S., & Kim, H.-Y. (1998). Second language development in writing: Measures of fluency, accuracy & Complexity. Honolulu, HI: University of Hawaii Press.
Google Scholar
Yang, W., Lu, X., & Weigle, S. C. (2015). Different topics, different discourse: Relationships among writing topic, measures of syntactic complexity, and judgments of writing quality. Journal of Second Language Writing, 28, 53–67. https://doi.org/10.1016/j.jslw.2015.02.002
Article Google Scholar

Download references

Author information

Authors and Affiliations

Department of Applied Linguistics & ESL, Georgia State University, Atlanta, GA, USA
Scott A. Crossley
Department of Second Language Studies, University of Hawai’i at Manoa, Honolulu, HI, USA
Kristopher Kyle

Authors

Scott A. Crossley
View author publications
You can also search for this author in PubMed Google Scholar
Kristopher Kyle
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Scott A. Crossley .

Editor information

Editors and Affiliations

Sydney School of Education and Social Work, University of Sydney, Sydney, NSW, Australia
Aek Phakiti
Department of Linguistics, Germanic, Slavic, Asian and African Languages, Michigan State University, East Lansing, MI, USA
Peter De Costa
Applied Linguistics, Northern Arizona University, Flagstaff, AZ, USA
Luke Plonsky
School of Education, UNSW Sydney, Sydney, NSW, Australia
Sue Starfield

Copyright information

About this chapter

Cite this chapter

Crossley, S.A., Kyle, K. (2018). Analyzing Spoken and Written Discourse: A Role for Natural Language Processing Tools. In: Phakiti, A., De Costa, P., Plonsky, L., Starfield, S. (eds) The Palgrave Handbook of Applied Linguistics Research Methodology. Palgrave Macmillan, London. https://doi.org/10.1057/978-1-137-59900-1_25

Download citation

DOI: https://doi.org/10.1057/978-1-137-59900-1_25
Publisher Name: Palgrave Macmillan, London
Print ISBN: 978-1-137-59899-8
Online ISBN: 978-1-137-59900-1
eBook Packages: Social SciencesSocial Sciences (R0)

Publish with us

Policies and ethics