Abstract
Synthesis writing is widely taught across domains and serves as an important means of assessing writing ability, text comprehension, and content learning. Synthesis writing differs from other types of writing in terms of both cognitive and task demands because it requires writers to integrate information across source materials. However, little is known about how integration of source material may influence overall writing quality for synthesis tasks. This study examined approximately 900 source-based essays written in response to four different synthesis prompts which instructed writers to use information from the sources to illustrate and support their arguments and clearly indicate from which sources they were drawing (i.e., citation use). The essays were then scored by expert raters for holistic quality, argumentation, and source use/inferencing. Hand-crafted natural language processing (NLP) features and pre-existing NLP tools were used to examine semantic and keyword overlap between the essays and the source texts, plagiarism from the source texts, and instances of source citation and quoting. These variables along with text length and prompt were then used to predict essays scores. Results reported strong models for predicting human ratings that explained between 47 and 52% of the variance in scores. The results indicate that text length was the strongest predictor of score but also that more successful writers include stronger, semantically-related information from the source, provide more citations and do so later in the text, and copy less from the text. This work introduces the use of NLP techniques to assess source integration, provides details on the types of source integration used by writers, and highlights the effects of source integration on writing quality.
Similar content being viewed by others
Notes
Our initial corpus was 919 texts. We removed ten texts because participants either did not write on topic or copy and pasted the entire essay from available sources.
References
AashitaK/Plagiarism-Detection. (n.d.). GitHub. Retrieved May 19, 2021, from https://github.com/AashitaK/Plagiarism-Detection/blob/master/notebook.ipynb.
Bazerman, C. (2004). Intertextuality: How texts rely on other texts. In Bazerman, C., & Prior, P. (Eds.), What writing does and how it does it: An introduction to analyzing texts and textual practices (1st ed., pp. 83–96). Routledge. https://doi.org/10.4324/9781410609526
Belcher, D., & Hirvela, A. (Eds.). (2001). Linking Literacies. University of Michigan Press. https://doi.org/10.3998/mpub.11496
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. The Journal of Machine Learning Research, 3, 993–1022. https://doi.org/10.1162/jmlr.2003.3.4-5.993
Borg, E. (2000). Citation practices in academic writing. In Thompson, P. (Ed.), Patterns and perspectives: Insights into EAP writing practice (pp. 26–42). Reading, UK: Centre for Applied Language Studies.
Breiman, L. (2001). Statistical Modeling: The Two Cultures (with comments and a rejoinder by the author). Statistical Science, 16(3), 199–231. https://doi.org/10.1214/ss/1009213726
Ceska, Z., & Fox, C. (2009). The Influence of Text Pre-processing on Plagiarism Detection. In Angelova, G., Bontcheva, K., Mitkov, R., Nicolov, N., & Nikolov, N. (Eds.), Proceedings of the International Conference RANLP-2009 (pp. 55–59). Association for Computational Linguistics.
Chandrasoma, R., Thompson, C., & Pennycook, A. (2004). Beyond Plagiarism: Transgressive and nontransgressive intertextuality. Journal of Language, Identity & Education, 3(3), 171–193. https://doi.org/10.1207/s15327701jlie0303_1
Chong, M., Specia, L., & Mitkov, R. (2010). Using natural language processing for automatic detection of plagiarism. In Proceedings of the 4th International Plagiarism Conference (IPC-2010).
Clough, P., & Stevenson, M. (2011). Developing a corpus of plagiarised short answers. Language Resources and Evaluation, 45(1), 5–24. https://doi.org/10.1007/s10579-009-9112-1
Crossley, S. A, Kyle, K., Davenport, J., & McNamara, D. S. (2016). Automatic assessment of constructed response data in a chemistry tutor. In Barnes, T., Chi, M., & Feng, M. (eds.). Proceedings of the 9th International Educational Data Mining (pp. 336–340). EDM Society.
Crossley, S. A., Kyle, K., & Dascalu, M. (2019). The Tool for the Automatic Analysis of Cohesion 2.0: Integrating semantic similarity and text overlap. Behavior Research Methods, 51, 14–27. https://doi.org/10.3758/s13428-018-1142-4
Crossley, S. A., Varner, L., & McNamara, D. S. (2013). Cohesion-based prompt effects in argumentative writing. In McCarthy, P. M. & Youngblood G. M., (Eds.). Proceedings of the 26th International Florida Artificial Intelligence Research Society (FLAIRS) Conference. (pp. 202–207). Menlo Park, CA: The AAAI Press.
Cumming, A., Lai, C., & Cho, H. (2016). Students’ writing from sources for academic purposes: A synthesis of recent research. Journal of English for Academic Purposes, 23, 47–58. https://doi.org/10.1016/j.jeap.2016.06.002
Davies, M. (2008). The Corpus of Contemporary American English. www.english-corpora.org/coca/
Dodigovic, M. (2005). Artificial intelligence in second language learning: Raising error awareness. Multilingual Matters.
Ferris, D. (1994). Lexical and syntactic features of ESL writing by students at different levels of L2 proficiency. TESOL Quarterly, 28, 414–420.
Frase, L., Faletti, J., Ginther, A., & Grant, L. (1999). Computer analysis of the TOEFL test of written English (TOEFL Research Report No. 64). Princeton, NJ: ETS.
Gebril, A., & Plakans, L. (2009). Investigating source use, discourse features, and process in integrated writing tests. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 7(1), 47–84.
Grabe, W., & Zhang, C. (2013). Reading and writing together: A critical component of English for academic purposes teaching and learning. TESOL Journal, 4(1), 9–24. https://doi.org/10.1002/tesj.65
Granger, S., Kraif, O., Ponton, C., Antoniadis, G., & Zampa, V. (2007). Integrating learner corpora and natural language processing: A crucial step towards reconciling technological sophistication and pedagogical effectiveness. ReCALL, 19(3), 252–268. https://doi.org/10.1017/s0958344007000237
Grömping, U. (2009). Variable Importance Assessment in Regression: Linear Regression versus Random Forest. The American Statistician, 63(4), 308–319. https://doi.org/10.1198/tast.2009.08199
Guo, L., Crossley, S. A., & McNamara, D. S. (2013). Predicting human judgments of essay quality in both integrated and independent second language writing samples: A comparison study. Assessing Writing, 18(3), 218–238.
Haswell, R. H. (2000). Documenting improvement in college writing: A longitudinal approach. Written Communication, 17(3), 307–352. https://doi.org/10.1177/0741088300017003001
Higgins, D., Xi, X., Zechner, K., & Williamson, D. (2011). A three-stage approach to the automated scoring of spontaneous spoken responses. Computer Speech & Language, 25(2), 282–306. https://doi.org/10.1016/j.csl.2010.06.001
Hinkel, E. (2002). Second language writers’ text. Lawrence Erlbaum Associates.
Hirvela, A. (2011). Writing to learn in content areas: Research insights. In ManchónR. M. (Ed.), Learning-to-Write and Writing-to-Learn in an Additional Language (pp. 37–59). Amsterdam/Philadelphia John Benjamins Publishing Company.
Hood, S. (2008). Summary writing in academic contexts: Implicating meaning in processes of change. Linguistics and Education, 19(4), 351–365. https://doi.org/10.1016/j.linged.2008.06.003
Huff, C., & Tingley, D. (2015). “Who are these people?” Evaluating the demographic characteristics and political preferences of MTurk survey respondents. Research & Politics, 2(3), 2053168015604648.
Huot, B. (1990). The literature of direct writing assessment: Major concerns and prevailing trends. Review of Educational Research, 60, 237–263.
Kuhn, K. (2016). Contributions from Wing J, Weston S, Williams A, Keefer C, Engelhardt A, et al. caret: Classification and Regression Training. R package version, 6-0.
Landauer, T. K., Foltz, P. W., & Laham, D. (1998). An introduction to latent semantic analysis. Discourse Processes, 25(2–3), 259–284. https://doi.org/10.1080/01638539809545028
Leki, I. (2017). Undergraduates in a second language: Challenges and complexities of academic literacy development. Routledge.
Leijten, M., Van Waes, L., Schrijver, I., Bernolet, S., & Vangehuchten, L. (2019). Mapping master’s students’use of external sources in source-based writing in L1 and L2. Studies in Second Language Acquisition, 41(3), 555–582. https://doi.org/10.1017/s0272263119000251
Lillis, T. M., & Curry, M. J. (2010). Academic writing in global context. Routledge.
Martínez, I., Mateos, M., Martín, E., & Rijlaarsdam, G. (2015). Learning history by composing synthesis texts: Effects of an instructional programme on learning, reading and writing processes, and text quality. Journal of Writing Research, 7(2), 275–302. https://doi.org/10.17239/jowr-2015.07.02.03
Mateos, M., Martín, E., Villalón, R., & Luna, M. (2008). Reading and writing to learn in secondary education: Online processing activity and written products in summarizing and synthesizing tasks. Reading and Writing, 21(7), 675–697. https://doi.org/10.1007/s11145-007-9086-6
Mateos, M., & Solé, I. (2009). Synthesising information from various texts: A study of procedures and products at different educational levels. European Journal of Psychology of Education, 24(4), 435–451. https://doi.org/10.1007/bf03178760
Melzer, D. (2009). Writing assignments across the curriculum: A national study of college writing. College Composition and Communication, 61(2), W240–W261.
Meurers, D. (2015). Learner corpora and natural language processing. In Granger, S., Gaëtanelle Gilquin, & Meunier, F. (Eds.), The Cambridge handbook of learner corpus research (pp. 537–566). Cambridge University Press.
Mikolov, T., Chen, K., Corrado, G., & Dean, J. (2013). Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781.
Newell, G. E., Beach, R., Smith, J., & VanDerHeide, J. (2011). Teaching and learning argumentative reading and writing: A review of research. Reading Research Quarterly, 46(3), 273–304. https://doi.org/10.1598/RRQ.46.3.4
Ockenburg, L. V., Weijen, D. V., & Rijlaarsdam, G. (2018). Syntheseteksten leren schrijven in het voortgezet onderwijs; Het verband tussen schrijfaanpak en voorkeur voor leeractiviteiten. Levende Talen Tijdschrift, 19(2), 3–14.
Petrić, B. (2012). Legitimate textual borrowing: Direct quotation in L2 student writing. Journal of Second Language Writing, 21(2), 102–117. https://doi.org/10.1016/j.jslw.2012.03.005
Plakans, L. (2009). Discourse synthesis in integrated second language writing assessment. Language Testing, 26(4), 561–587. https://doi.org/10.1177/0265532209340192
Plakans, L., & Gebril, A. (2012). A close investigation into source use in integrated second language writing tasks. Assessing Writing, 17(1), 18–34. https://doi.org/10.1016/j.asw.2011.09.002
Plakans, L., & Gebril, A. (2013). Using multiple texts in an integrated writing assessment: Source text use as a predictor of score. Journal of Second Language Writing, 22(3), 217–230. https://doi.org/10.1016/j.jslw.2013.02.003
R Core Team (2017). R: A language and environment for statistical computing. R Foundation for Statistical Computing. Vienna, Austria: URL http://www.R-project.org/.
Shi, L. (2004). Textual borrowing in second-language writing. Written Communication, 21(2), 171–200. https://doi.org/10.1177/0741088303262846
Solé, I., Miras, M., Castells, N., Espino, S., & Minguela, M. (2013). Integrating information: An analysis of the processes involved and the products generated in a written synthesis task. Written Communication, 30(1), 63–90. https://doi.org/10.1177/0741088312466532
Spivey, N. (1997). The constructivist metaphor: Reading, writing, and making of meaning. Academic Press.
Spivey, N. N., & King, J. R. (1989). Readers as writers composing from sources. Reading Research Quarterly, 24(1), 7–26. https://doi.org/10.1598/rrq.24.1.1
Tardy, C. M. (2009). Building genre knowledge. Parlor Press.
Tedick, D. J. (1990). ESL writing assessment: Subject-matter knowledge and its impact on performance. English for Specific Purposes, 9, 123–143.
Uludag, P., Lindberg, R., McDonough, K., & Payant, C. (2019). Exploring L2 writers’ source-text use in an integrated writing assessment. Journal of Second Language Writing, 46, 100670. https://doi.org/10.1016/j.jslw.2019.100670
Vandermeulen, N., van den Broek, B., Van Steendam, E., & Rijlaarsdam, G. (2019). In search of an effective source use pattern for writing argumentative and informative synthesis texts. Reading and Writing, 33(2), 239–266. https://doi.org/10.1007/s11145-019-09958-3
van Weijen, D., Rijlaarsdam, G., & van den Bergh, H. (2019). Source use and argumentation behavior in L1 and L2 writing: A within-writer comparison. Reading and Writing, 32(6), 1635–1655. https://doi.org/10.1007/s11145-018-9842-9
Weigle, S. C., & Parker, K. (2012). Source text borrowing in an integrated reading/writing assessment. Journal of Second Language Writing, 21(2), 118–133. https://doi.org/10.1016/j.jslw.2012.03.004
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31(1), 2–13. https://doi.org/10.1111/j.1745-3992.2011.00223.x
Acknowledgements
This research was supported in part by the Institute for Education Sciences (IES R305A180261 and R305A180144) and the Office of Naval Research (N00014-20-1-2623). The deas expressed in this material are those of the authors and do not necessarily reflect the views of our funders.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Crossley, S., Wan, Q., Allen, L. et al. Source inclusion in synthesis writing: an NLP approach to understanding argumentation, sourcing, and essay quality. Read Writ 36, 1053–1083 (2023). https://doi.org/10.1007/s11145-021-10221-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11145-021-10221-x