Skip to main content
Log in

Source inclusion in synthesis writing: an NLP approach to understanding argumentation, sourcing, and essay quality

  • Published:
Reading and Writing Aims and scope Submit manuscript

Abstract

Synthesis writing is widely taught across domains and serves as an important means of assessing writing ability, text comprehension, and content learning. Synthesis writing differs from other types of writing in terms of both cognitive and task demands because it requires writers to integrate information across source materials. However, little is known about how integration of source material may influence overall writing quality for synthesis tasks. This study examined approximately 900 source-based essays written in response to four different synthesis prompts which instructed writers to use information from the sources to illustrate and support their arguments and clearly indicate from which sources they were drawing (i.e., citation use). The essays were then scored by expert raters for holistic quality, argumentation, and source use/inferencing. Hand-crafted natural language processing (NLP) features and pre-existing NLP tools were used to examine semantic and keyword overlap between the essays and the source texts, plagiarism from the source texts, and instances of source citation and quoting. These variables along with text length and prompt were then used to predict essays scores. Results reported strong models for predicting human ratings that explained between 47 and 52% of the variance in scores. The results indicate that text length was the strongest predictor of score but also that more successful writers include stronger, semantically-related information from the source, provide more citations and do so later in the text, and copy less from the text. This work introduces the use of NLP techniques to assess source integration, provides details on the types of source integration used by writers, and highlights the effects of source integration on writing quality.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1

source use/inferencing scores by group

Fig. 2

Similar content being viewed by others

Notes

  1. Our initial corpus was 919 texts. We removed ten texts because participants either did not write on topic or copy and pasted the entire essay from available sources.

References

  • AashitaK/Plagiarism-Detection. (n.d.). GitHub. Retrieved May 19, 2021, from https://github.com/AashitaK/Plagiarism-Detection/blob/master/notebook.ipynb.

  • Bazerman, C. (2004). Intertextuality: How texts rely on other texts. In Bazerman, C., & Prior, P. (Eds.), What writing does and how it does it: An introduction to analyzing texts and textual practices (1st ed., pp. 83–96). Routledge. https://doi.org/10.4324/9781410609526

  • Belcher, D., & Hirvela, A. (Eds.). (2001). Linking Literacies. University of Michigan Press. https://doi.org/10.3998/mpub.11496

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993

    Article  Google Scholar 

  • Borg, E. (2000). Citation practices in academic writing. In Thompson, P. (Ed.), Patterns and perspectives: Insights into EAP writing practice (pp. 26–42). Reading, UK: Centre for Applied Language Studies.

  • Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. https://doi.org/10.1214/ss/1009213726

    Article  Google Scholar 

  • Ceska, Z., & Fox, C. (2009). The Influence of Text Pre-processing on Plagiarism Detection. In Angelova, G., Bontcheva, K., Mitkov, R., Nicolov, N., & Nikolov, N. (Eds.), Proceedings of the International Conference RANLP-2009 (pp. 55–59). Association for Computational Linguistics.

  • Chandrasoma, R., Thompson, C., & Pennycook, A. (2004). Beyond Plagiarism: Transgressive and nontransgressive intertextuality. Journal of Language, Identity & Education, 3(3), 171–193. https://doi.org/10.1207/s15327701jlie0303_1

    Article  Google Scholar 

  • Chong, M., Specia, L., & Mitkov, R. (2010). Using natural language processing for automatic detection of plagiarism. In Proceedings of the 4th International Plagiarism Conference (IPC-2010).

  • Clough, P., & Stevenson, M. (2011). Developing a corpus of plagiarised short answers. Language Resources and Evaluation, 45(1), 5–24. https://doi.org/10.1007/s10579-009-9112-1

    Article  Google Scholar 

  • Crossley, S. A, Kyle, K., Davenport, J., & McNamara, D. S. (2016). Automatic assessment of constructed response data in a chemistry tutor. In Barnes, T., Chi, M., & Feng, M. (eds.). Proceedings of the 9th International Educational Data Mining (pp. 336–340). EDM Society.

  • Crossley, S. A., Kyle, K., & Dascalu, M. (2019). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods, 51, 14–27. https://doi.org/10.3758/s13428-018-1142-4

    Article  Google Scholar 

  • Crossley, S. A., Varner, L., & McNamara, D. S. (2013). Cohesion-based prompt effects in argumentative writing. In McCarthy, P. M. & Youngblood G. M., (Eds.). Proceedings of the 26th International Florida Artificial Intelligence Research Society (FLAIRS) Conference. (pp. 202–207). Menlo Park, CA: The AAAI Press.

  • Cumming, A., Lai, C., & Cho, H. (2016). Students’ writing from sources for academic purposes: A synthesis of recent research. Journal of English for Academic Purposes, 23, 47–58. https://doi.org/10.1016/j.jeap.2016.06.002

    Article  Google Scholar 

  • Davies, M. (2008). The Corpus of Contemporary American English. www.english-corpora.org/coca/

  • Dodigovic, M. (2005). Artificial intelligence in second language learning: Raising error awareness. Multilingual Matters.

    Book  Google Scholar 

  • Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28, 414–420.

    Article  Google Scholar 

  • Frase, L., Faletti, J., Ginther, A., & Grant, L. (1999). Computer analysis of the TOEFL test of written English (TOEFL Research Report No. 64). Princeton, NJ: ETS.

  • Gebril, A., & Plakans, L. (2009). Investigating source use, discourse features, and process in integrated writing tests. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 7(1), 47–84.

  • Grabe, W., & Zhang, C. (2013). Reading and writing together: A critical component of English for academic purposes teaching and learning. TESOL Journal, 4(1), 9–24. https://doi.org/10.1002/tesj.65

    Article  Google Scholar 

  • Granger, S., Kraif, O., Ponton, C., Antoniadis, G., & Zampa, V. (2007). Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCALL, 19(3), 252–268. https://doi.org/10.1017/s0958344007000237

    Article  Google Scholar 

  • Grömping, U. (2009). Variable Importance Assessment in Regression: Linear Regression versus Random Forest. The American Statistician, 63(4), 308–319. https://doi.org/10.1198/tast.2009.08199

    Article  Google Scholar 

  • Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238.

    Article  Google Scholar 

  • Haswell, R. H. (2000). Documenting improvement in college writing: A longitudinal approach. Written Communication, 17(3), 307–352. https://doi.org/10.1177/0741088300017003001

    Article  Google Scholar 

  • Higgins, D., Xi, X., Zechner, K., & Williamson, D. (2011). A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech & Language, 25(2), 282–306. https://doi.org/10.1016/j.csl.2010.06.001

    Article  Google Scholar 

  • Hinkel, E. (2002). Second language writers’ text. Lawrence Erlbaum Associates.

    Book  Google Scholar 

  • Hirvela, A. (2011). Writing to learn in content areas: Research insights. In ManchónR. M. (Ed.), Learning-to-Write and Writing-to-Learn in an Additional Language (pp. 37–59). Amsterdam/Philadelphia John Benjamins Publishing Company.

  • Hood, S. (2008). Summary writing in academic contexts: Implicating meaning in processes of change. Linguistics and Education, 19(4), 351–365. https://doi.org/10.1016/j.linged.2008.06.003

    Article  Google Scholar 

  • Huff, C., & Tingley, D. (2015). “Who are these people?” Evaluating the demographic characteristics and political preferences of MTurk survey respondents. Research & Politics, 2(3), 2053168015604648.

    Article  Google Scholar 

  • Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60, 237–263.

    Article  Google Scholar 

  • Kuhn, K. (2016). Contributions from Wing J, Weston S, Williams A, Keefer C, Engelhardt A, et al. caret: Classification and Regression Training. R package version, 6-0.

  • Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284. https://doi.org/10.1080/01638539809545028

    Article  Google Scholar 

  • Leki, I. (2017). Undergraduates in a second language: Challenges and complexities of academic literacy development. Routledge.

  • Leijten, M., Van Waes, L., Schrijver, I., Bernolet, S., & Vangehuchten, L. (2019). Mapping master’s students’use of external sources in source-based writing in L1 and L2. Studies in Second Language Acquisition, 41(3), 555–582. https://doi.org/10.1017/s0272263119000251

    Article  Google Scholar 

  • Lillis, T. M., & Curry, M. J. (2010). Academic writing in global context. Routledge.

    Google Scholar 

  • Martínez, I., Mateos, M., Martín, E., & Rijlaarsdam, G. (2015). Learning history by composing synthesis texts: Effects of an instructional programme on learning, reading and writing processes, and text quality. Journal of Writing Research, 7(2), 275–302. https://doi.org/10.17239/jowr-2015.07.02.03

  • Mateos, M., Martín, E., Villalón, R., & Luna, M. (2008). Reading and writing to learn in secondary education: Online processing activity and written products in summarizing and synthesizing tasks. Reading and Writing, 21(7), 675–697. https://doi.org/10.1007/s11145-007-9086-6

    Article  Google Scholar 

  • Mateos, M., & Solé, I. (2009). Synthesising information from various texts: A study of procedures and products at different educational levels. European Journal of Psychology of Education, 24(4), 435–451. https://doi.org/10.1007/bf03178760

    Article  Google Scholar 

  • Melzer, D. (2009). Writing assignments across the curriculum: A national study of college writing. College Composition and Communication, 61(2), W240–W261.

    Google Scholar 

  • Meurers, D. (2015). Learner corpora and natural language processing. In Granger, S., Gaëtanelle Gilquin, & Meunier, F. (Eds.), The Cambridge handbook of learner corpus research (pp. 537–566). Cambridge University Press.

  • Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.

  • Newell, G. E., Beach, R., Smith, J., & VanDerHeide, J. (2011). Teaching and learning argumentative reading and writing: A review of research. Reading Research Quarterly, 46(3), 273–304. https://doi.org/10.1598/RRQ.46.3.4

    Article  Google Scholar 

  • Ockenburg, L. V., Weijen, D. V., & Rijlaarsdam, G. (2018). Syntheseteksten leren schrijven in het voortgezet onderwijs; Het verband tussen schrijfaanpak en voorkeur voor leeractiviteiten. Levende Talen Tijdschrift, 19(2), 3–14.

    Google Scholar 

  • Petrić, B. (2012). Legitimate textual borrowing: Direct quotation in L2 student writing. Journal of Second Language Writing, 21(2), 102–117. https://doi.org/10.1016/j.jslw.2012.03.005

    Article  Google Scholar 

  • Plakans, L. (2009). Discourse synthesis in integrated second language writing assessment. Language Testing, 26(4), 561–587. https://doi.org/10.1177/0265532209340192

    Article  Google Scholar 

  • Plakans, L., & Gebril, A. (2012). A close investigation into source use in integrated second language writing tasks. Assessing Writing, 17(1), 18–34. https://doi.org/10.1016/j.asw.2011.09.002

    Article  Google Scholar 

  • Plakans, L., & Gebril, A. (2013). Using multiple texts in an integrated writing assessment: Source text use as a predictor of score. Journal of Second Language Writing, 22(3), 217–230. https://doi.org/10.1016/j.jslw.2013.02.003

    Article  Google Scholar 

  • R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria: URL http://www.R-project.org/.

  • Shi, L. (2004). Textual borrowing in second-language writing. Written Communication, 21(2), 171–200. https://doi.org/10.1177/0741088303262846

    Article  Google Scholar 

  • Solé, I., Miras, M., Castells, N., Espino, S., & Minguela, M. (2013). Integrating information: An analysis of the processes involved and the products generated in a written synthesis task. Written Communication, 30(1), 63–90. https://doi.org/10.1177/0741088312466532

    Article  Google Scholar 

  • Spivey, N. (1997). The constructivist metaphor: Reading, writing, and making of meaning. Academic Press.

    Google Scholar 

  • Spivey, N. N., & King, J. R. (1989). Readers as writers composing from sources. Reading Research Quarterly, 24(1), 7–26. https://doi.org/10.1598/rrq.24.1.1

    Article  Google Scholar 

  • Tardy, C. M. (2009). Building genre knowledge. Parlor Press.

  • Tedick, D. J. (1990). ESL writing assessment: Subject-matter knowledge and its impact on performance. English for Specific Purposes, 9, 123–143.

    Article  Google Scholar 

  • Uludag, P., Lindberg, R., McDonough, K., & Payant, C. (2019). Exploring L2 writers’ source-text use in an integrated writing assessment. Journal of Second Language Writing, 46, 100670. https://doi.org/10.1016/j.jslw.2019.100670

    Article  Google Scholar 

  • Vandermeulen, N., van den Broek, B., Van Steendam, E., & Rijlaarsdam, G. (2019). In search of an effective source use pattern for writing argumentative and informative synthesis texts. Reading and Writing, 33(2), 239–266. https://doi.org/10.1007/s11145-019-09958-3

    Article  Google Scholar 

  • van Weijen, D., Rijlaarsdam, G., & van den Bergh, H. (2019). Source use and argumentation behavior in L1 and L2 writing: A within-writer comparison. Reading and Writing, 32(6), 1635–1655. https://doi.org/10.1007/s11145-018-9842-9

    Article  Google Scholar 

  • Weigle, S. C., & Parker, K. (2012). Source text borrowing in an integrated reading/writing assessment. Journal of Second Language Writing, 21(2), 118–133. https://doi.org/10.1016/j.jslw.2012.03.004

    Article  Google Scholar 

  • Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13. https://doi.org/10.1111/j.1745-3992.2011.00223.x

    Article  Google Scholar 

Download references

Acknowledgements

This research was supported in part by the Institute for Education Sciences (IES R305A180261 and R305A180144) and the Office of Naval Research (N00014-20-1-2623). The deas expressed in this material are those of the authors and do not necessarily reflect the views of our funders.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Scott Crossley.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

See Table 8.

Table 8 Prompt and assignment information

Appendix B

See Table 9.

Table 9 Integrated essay scoring guidelines

Appendix C

See Table 10.

Table 10 NLP features used in modeling

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Crossley, S., Wan, Q., Allen, L. et al. Source inclusion in synthesis writing: an NLP approach to understanding argumentation, sourcing, and essay quality. Read Writ 36, 1053–1083 (2023). https://doi.org/10.1007/s11145-021-10221-x

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11145-021-10221-x

Keywords

Navigation