Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm

Myers, Matthew C.; Wilson, Joshua

doi:10.1007/s40593-022-00301-6

Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm

ARTICLE
Published: 09 August 2022

Volume 33, pages 609–634, (2023)
Cite this article

International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

471 Accesses
1 Citation
1 Altmetric
Explore all metrics

Abstract

This study evaluated the construct validity of six scoring traits of an automated writing evaluation (AWE) system called MI Write. Persuasive essays (N = 100) written by students in grades 7 and 8 were randomized at the sentence-level using a script written with Python’s NLTK module. Each persuasive essay was randomized 30 times (n = 3000 total randomizations), and the mean trait scores for each set of randomized iterations were compared to those of the control text across all traits. We were specifically interested in evaluating the effects of randomization on the high-level traits of idea development and organization. Given the rubrics and qualitative feedback provided by MI Write, we hypothesized that these high-level traits ought to be sensitive to sentence-level randomization (i.e., scores should decrease). Overall, complete randomizations did not consistently significantly impact trait scoring for these high-level writing traits. In fact, more than a third of the essays saw significant increases in one or both high-level traits despite randomization, indicating a disconnect between MI Write’s formative feedback and its underlying constructs. Findings have implications for consumers and developers of AWE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review of automated writing evaluation systems

Article 07 July 2022

Checking It Twice: Does Adding Spelling and Grammar Checkers Improve Essay Quality in an Automated Writing Tutor?

Assessing Students’ Use of Evidence and Organization in Response-to-Text Writing: Using Natural Language Processing for Rubric-Based Automated Scoring

Article 13 March 2017

References

Bai, L., & Hu, G. (2017). In the face of fallible AWE feedback: How do students respond? Educational Psychology, 37, 67–81. https://doi.org/10.1080/01443410.2016.1223275
Article Google Scholar
Bejar, I. (2011). A validity-based approach to quality control and assurance of automated scoring. Assessment in Education: Principles, Policy & Practice, 18(3), 319–341.
Google Scholar
Bejar, I., Flor, M., Futagi, Y., & Ramineni, C. (2014). On the vulnerability of automated scoring to construct-irrelevant response strategies (CIRS): An illustration. Assessing Writing, 22, 48–59.
Article Google Scholar
Carless, D. (2012). Trust and its role in facilitating dialogic feedback. In D. Boud & E. Moloy (Eds.), Feedback in higher and professional education: Understanding it and doing it well (pp. 90–103). Routledge.
Google Scholar
Chodorow, M., & Burstein, J. (2004). Beyond essay length: Evaluating e-rater® ’s performance on TOEFL® essays. ETS Research Reports., 2004, i–38. https://doi.org/10.1002/j.2333-8504.2004.tb01931.x
Article Google Scholar
Conference on College Composition and Communication. (2014). CCCC position statement on teaching, learning and assessing writing in digital environments. Retrieved January 22, 2022, from https://cccc.ncte.org/cccc/resources/positions/writingassessment
Deane, P. (2013). On the relation between automated essay scoring and modern views of the writing construct. Assessing Writing, 18, 7–24.
Article Google Scholar
Dujinhower, H., Prins, F. J., & Stokking, K. M. (2012). Feedback providing improvement strategies and reflection on feedback use: Effects on students’ writing motivation, process, and performance. Learning and Instruction, 22, 171–184.
Article Google Scholar
Graham, S., Hebert, M., & Harris, K. R. (2015). Formative assessment and writing: A meta-analysis. Elementary School Journal, 115, 523–547.
Article Google Scholar
Graesser, A. C., McNamara, D. S., & Louwerse, M. M. (2003). What do readers need to learn in order to process coherence relations in narrative and expository text? In A. P. Sweet & C. E. Snow (Eds.), Rethinking reading comprehension (pp. 82–98). Guilford.
Google Scholar
Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36(2), 193–202. https://doi.org/10.3758/bf03195564
Article Google Scholar
Higgins, D., & Heilman, M. (2014). Managing what we can measure: Quantifying the susceptibility of automated essay scoring systems to gaming behavior. Educational Measurement: Issues and Practice, 33(3), 36–46.
Article Google Scholar
Huang, Y., & Wilson, J. (2021). Using automated feedback to develop writing proficiency. Computers and Composition, 62, 103675. https://doi.org/10.1016/j.compcom.2021.102675
Article Google Scholar
Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.
Article Google Scholar
Kellogg, R. T., Whiteford, A. P., & Quinlan, T. (2010). Does automated feedback help students learn to write? Journal of Educational Computing Research, 42(2), 173–196.
Article Google Scholar
Kumar, V. S., & Boulanger, D. (2021). Automated essay scoring and the deep learning black box: How are rubric scores determined? International Journal of Artificial Intelligence in Education, 31, 538–584. https://doi.org/10.1007/s40593-020-00211-5
Article Google Scholar
Liaqat, A., Munteanu, C., & Epp, C. D. (2021). Collaborating with mature English language learners to combine peer and automated feedback: A user-centered approach to designing writing support. International Journal of Artificial Intelligence in Education, 31, 638–679. https://doi.org/10.1007/s40593-020-00204-4
Article Google Scholar
MacArthur, C. A., Jennings, A., & Philippakos, Z. A. (2019). Which linguistic features predict quality of argumentative writing for college basic writers, and how do those features change with instruction? Reading and Writing, 32, 1553–1574. https://doi.org/10.1007/s11145-018-9853-6
Article Google Scholar
National Center for Education Statistics. (2012). The Nation’s Report Card: Writing 2011 (NCES 2012–470). Institute of Education Sciences, U.S. Department of Education.
Google Scholar
National Council of Teachers of English. (2013). NCTE position statement on machine scoring. Retrieved January 22, 2022, from http://www.ncte.org/positions/statements/machine_scoring
Northwest Regional Educational Laboratory. (2004). An introduction to the 6+1 trait writing assessment model. Author.
Google Scholar
Pajares, F. (2003). Self-efficacy beliefs, motivation, and achievement in writing: A review of the literature. Reading & Writing Quarterly, 19, 139–158.
Article Google Scholar
Palermo, C., & Thomson, M. M. (2018). Teacher implementation of self-regulated strategy development with an automated writing evaluation system: Effects on the argumentative writing performance of middle school students. Contemporary Educational Psychology, 54, 255–270.
Article Google Scholar
Palermo, C., & Wilson, J. (2020). Implementing automated writing evaluation in different instructional contexts: A mixed-methods study. Journal of Writing Research 12(1), 63–108. https://doi.org/10.17239/jowr-2020.12.01.04
Parekh S., Singla, Y. K., Chen, C., Li, J. J., & Shah, R. R. (2020). My teacher thinks the world is flat! Interpreting automatic essay scoring mechanism. Retrieved January 22, 2022, from http://arXiv.org/abs/2012.13872
Perelman, L. (2014). When the “state of the art” is counting words. Assessing Writing, 21, 104–111.
Article MathSciNet Google Scholar
Perelman, L. (2020). The BABEL generator and E-Rater: 21^st century writing constructs and automated essay scoring (AES). The Journal of Writing Assessment, 13(1). Retrieved January 22, 2022, from http://journalofwritingassessment.org/article.php?article=145
Perfetti, C. A. (1998). The limits of co-occurrence: Tools and theories in language research. Discourse Processes, 25, 363–377.
Article Google Scholar
Persky, H. R., Daane, M. C., & Jin, Y. (2002). The Nation’s report card: Writing 2002. (NCES 2003–529). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education
Powers, D. E., Burstein, J. C., Chodorow, M., Fowles, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18, 103–134.
Article Google Scholar
Raczynski, K., & Cohen, A. (2018). Appraising the scoring performance of automated essay scoring systems—Some additional considerations: Which essays? Which human raters? Which scores? Applied Measurement in Education, 31(3), 233–240.
Article Google Scholar
Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18, 25–39. https://doi.org/10.1016/j.asw.2012.10.004
Article Google Scholar
Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413–428.
Article Google Scholar
Shermis, M. D. (2014). The challenges of emulating human behavior in writing assessment. Assessing Writing, 22, 91–99.
Article Google Scholar
Shermis, M. D., & Burstein, J. C. (Eds.). (2003). Automated essay scoring: A cross-disciplinary perspective. Erlbaum.
Google Scholar
Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.
Google Scholar
Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78, 153–189.
Article Google Scholar
Stevenson, M. (2016). A critical interpretative synthesis: The integration of automated writing evaluation into classroom writing instruction. Computers and Composition, 42, 1–16. https://doi.org/10.1016/j.compcom.2016.05.001
Article Google Scholar
Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65.
Article Google Scholar
Vajjala, S. (2018). Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28, 79–105. https://doi.org/10.1007/s40593-017-0142-3
Article Google Scholar
Wang, E. L., Matsumura, L. C., Correnti, R., Litman, D., Zhang, H., Howe, E., Magooda, A., & Quintana, R. (2020). eRevis(ing): Students’ revision of text evidence use in an automated writing evaluation system. Assessing Writing, 44, 100449. https://doi.org/10.1016/j.asw.2020.100449
Article Google Scholar
Warschauer, M., & Grimes, D. (2008). Automated writing assessment in the classroom. Pedagogies, 3, 22–36.
Article Google Scholar
Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6, 145–178.
Article Google Scholar
Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31, 2–13.
Article Google Scholar
Wilson, J., Ahrendt, C., Fudge, E., Raiche, A., Beard, G., & MacArthur, C. A. (2021a). Elementary teachers’ perceptions of automated feedback and automated scoring: Transforming the teaching and learning of writing using automated writing evaluation. Computers & Education, 168, 104208. https://doi.org/10.1016/j.compedu.2021.104208
Article Google Scholar
Wilson, J., Chen, D., Sandbank, M. P., & Hebert, M. (2019). Generalizability of automated scores of writing quality in grades 3–5. Journal of Educational Psychology, 111, 619–640. https://doi.org/10.1037/edu0000311
Article Google Scholar
Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers and Education, 100, 94–109.
Article Google Scholar
Wilson, J., Huang, Y., Palermo, C., Beard, G., & MacArthur, C. A. (2021b). Automated feedback and automated scoring in the elementary grades: Usage, attitudes, and associations with writing outcomes in a districtwide implementation of MI Write. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-020-00236-w
Article Google Scholar
Wilson, J., Myers, M. C., & Potter, A. H. (2022). Investigating the promise of automated writing evaluation for supporting formative writing assessment at scale. Assessment in Education: Principles, Policy & Practice. https://doi.org/10.1080/0969594X.2022.2025762
Article Google Scholar
Wilson, J., & Roscoe, R. D. (2020). Automated writing evaluation and feedback: Multiple metrics of efficacy. Journal of Educational Computing Research, 58, 87–125. https://doi.org/10.1177/0735633119830764
Article Google Scholar
Wolfe, E. W. (2005). Uncovering raters’ cognitive processing and focus using think-aloud protocols. Journal of Writing Assessment, 2, 37–56.
Google Scholar

Download references

Author information

Authors and Affiliations

School of Education, University of Delaware, Willard Hall Education Building, Newark, DE, 19716, USA
Matthew C. Myers & Joshua Wilson

Authors

Matthew C. Myers
View author publications
You can also search for this author in PubMed Google Scholar
Joshua Wilson
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Matthew C. Myers.

Ethics declarations

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors declare no conflicts of interest relevant to the content of this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

Appendix B

Sample Persuasive Essay #53 – Nonrandomized (i.e., Control)

I believe that computers benefit our society. They teach hand–eye coordination. They also help us learn about faraway places and people. Lastly, you can talk to people. These are all things that benefit our society.

A good benefit in computers is that it teaches you hand–eye coordination. It helps you practice using your hands. That can benefit somebody because once you get alot of practice at it, you can type really fast. This will save you alot of time. Personally, I think that writing makes you get cramps in your hands and takes much longer to do rather than typing. It also benefits you because once you know where all the keys are and you can type without looking then you can multi-task. Once you get really good at typing you don't have to worry about having to look at the keys, you can type and do something else at the same time. This is how computers teach hand–eye coordination.

Another benefit about computers is that you can learn about far away places and people. A long time ago when I moved from New York, I had to leave all my closest friends. They were like family to me. I was really upset. Then a few years later when I was allowed to get a computer and a facebook, I found all of them. I was so joyful. Then we exchanged numbers and made plans to hang out. It was incredibly fun to see them again. I also have learned about far away places. There was an incident happening in Africa where this guy was taking kids as soldiers. I thought that we could make a difference by helping them out and stopping the bad man. Without computers alot of people wouldn't even know about the incident. This is how computers help bring friendships back together and help people in need.

Lastly, computers benefit the society because you can talk to people. If you have problems and you need to talk to somebody you can trust, then you could go on the computer and chat with them. Instead of walking to their house or calling them, you can just type to them. I think it is 10 times easier to just type especially after a long hard day of school and work. Another benefit would be that if you have an emergency you can quickly and easily type to somebody for help. They would get the messege quick and will be able to respond quickly. Calling them is annoying because what if they dont have their cellphone or maybe their phone died. It is just alot easier to type. Also, if you are ever bored you can just talk to them and maybe you'll have a good conversation. This is why I think that computers benefit the society.

This is why I think that computers benefit the society. They help your coordination. Which can possibly help you in a crisis. They will also help you learn about places and people that are far away. Which can bring back friendships and help people far away. And finally, you can talk to people on them. These are the three reasons why I think that computers benefit out society.

Total Score: 23.5
Idea Development	3.7	Word Choice	4.0
Organization	3.6	Sentence Fluency	4.0
Style	4.0	Conventions	4.2

Appendix C

Sample Persuasive Essay #53 – Randomized Iteration #17

A long time ago when I moved from New York, I had to leave all my closest friends. Which can possibly help you in a crisis. Lastly, computers benefit the society because you can talk to people. Calling them is annoying because what if they dont have their cellphone or maybe their phone died. They help your coordination.

This is why I think that computers benefit the society. Then a few years later when I was allowed to get a computer and a facebook, I found all of them. Personally, I think that writing makes you get cramps in your hands and takes much longer to do rather than typing. I thought that we could make a difference by helping them out and stopping the bad man. This is why I think that computers benefit the society. I think it is 10 times easier to just type especially after a long hard day of school and work. That can benefit somebody because once you get alot of practice at it, you can type really fast. Another benefit would be that if you have an emergency you can quickly and easily type to somebody for help.

I was really upset. There was an incident happening in Africa where this guy was taking kids as soldiers. It also benefits you because once you know where all the keys are and you can type without looking then you can multi-task. It was incredibly fun to see them again. It is just alot easier to type. Which can bring back friendships and help people far away. Then we exchanged numbers and made plans to hang out. Instead of walking to their house or calling them, you can just type to them. It helps you practice using your hands. This will save you alot of time. I also have learned about far away places. Lastly, you can talk to people. A good benefit in computers is that it teaches you hand–eye coordination.

I believe that computers benefit our society. These are the three reasons why I think that computers benefit out society. And finally, you can talk to people on them. If you have problems and you need to talk to somebody you can trust, then you could go on the computer and chat with them. They will also help you learn about places and people that are far away. They also help us learn about faraway places and people. Another benefit about computers is that you can learn about far away places and people. Without computers alot of people wouldn't even know about the incident. They were like family to me. They would get the messege quick and will be able to respond quickly.

These are all things that benefit our society. This is how computers help bring friendships back together and help people in need. Also, if you are ever bored you can just talk to them and maybe you'll have a good conversation. They teach hand–eye coordination. Once you get really good at typing you don't have to worry about having to look at the keys, you can type and do something else at the same time. I was so joyful. This is how computers teach hand–eye coordination.

Total Score: 23.7
Idea Development	3.7	Word Choice	4.1
Organization	3.6	Sentence Fluency	4.0
Style	4.0	Conventions	4.3

Rights and permissions

Reprints and permissions

About this article

Cite this article

Myers, M.C., Wilson, J. Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm. Int J Artif Intell Educ 33, 609–634 (2023). https://doi.org/10.1007/s40593-022-00301-6

Download citation

Accepted: 02 July 2022
Published: 09 August 2022
Issue Date: September 2023
DOI: https://doi.org/10.1007/s40593-022-00301-6

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm

Abstract

Access this article

Similar content being viewed by others

A systematic review of automated writing evaluation systems

Checking It Twice: Does Adding Spelling and Grammar Checkers Improve Essay Quality in an Automated Writing Tutor?

Assessing Students’ Use of Evidence and Organization in Response-to-Text Writing: Using Natural Language Processing for Rubric-Based Automated Scoring

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Sample Persuasive Essay #53 – Nonrandomized (i.e., Control)

Appendix C

Sample Persuasive Essay #53 – Randomized Iteration #17

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm

Abstract

Access this article

Similar content being viewed by others

A systematic review of automated writing evaluation systems

Checking It Twice: Does Adding Spelling and Grammar Checkers Improve Essay Quality in an Automated Writing Tutor?

Assessing Students’ Use of Evidence and Organization in Response-to-Text Writing: Using Natural Language Processing for Rubric-Based Automated Scoring

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Additional information

Publisher's Note

Appendices

Appendix A

Appendix B

Sample Persuasive Essay #53 – Nonrandomized (i.e., Control)

Appendix C

Sample Persuasive Essay #53 – Randomized Iteration #17

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation