Skip to main content
Log in

Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm

  • ARTICLE
  • Published:
International Journal of Artificial Intelligence in Education Aims and scope Submit manuscript

Abstract

This study evaluated the construct validity of six scoring traits of an automated writing evaluation (AWE) system called MI Write. Persuasive essays (N = 100) written by students in grades 7 and 8 were randomized at the sentence-level using a script written with Python’s NLTK module. Each persuasive essay was randomized 30 times (n = 3000 total randomizations), and the mean trait scores for each set of randomized iterations were compared to those of the control text across all traits. We were specifically interested in evaluating the effects of randomization on the high-level traits of idea development and organization. Given the rubrics and qualitative feedback provided by MI Write, we hypothesized that these high-level traits ought to be sensitive to sentence-level randomization (i.e., scores should decrease). Overall, complete randomizations did not consistently significantly impact trait scoring for these high-level writing traits. In fact, more than a third of the essays saw significant increases in one or both high-level traits despite randomization, indicating a disconnect between MI Write’s formative feedback and its underlying constructs. Findings have implications for consumers and developers of AWE.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1

Similar content being viewed by others

References

  • Bai, L., & Hu, G. (2017). In the face of fallible AWE feedback: How do students respond? Educational Psychology, 37, 67–81. https://doi.org/10.1080/01443410.2016.1223275

    Article  Google Scholar 

  • Bejar, I. (2011). A validity-based approach to quality control and assurance of automated scoring. Assessment in Education: Principles, Policy & Practice, 18(3), 319–341.

    Google Scholar 

  • Bejar, I., Flor, M., Futagi, Y., & Ramineni, C. (2014). On the vulnerability of automated scoring to construct-irrelevant response strategies (CIRS): An illustration. Assessing Writing, 22, 48–59.

    Article  Google Scholar 

  • Carless, D. (2012). Trust and its role in facilitating dialogic feedback. In D. Boud & E. Moloy (Eds.), Feedback in higher and professional education: Understanding it and doing it well (pp. 90–103). Routledge.

    Google Scholar 

  • Chodorow, M., & Burstein, J. (2004). Beyond essay length: Evaluating e-rater® ’s performance on TOEFL® essays. ETS Research Reports., 2004, i–38. https://doi.org/10.1002/j.2333-8504.2004.tb01931.x

    Article  Google Scholar 

  • Conference on College Composition and Communication. (2014). CCCC position statement on teaching, learning and assessing writing in digital environments. Retrieved January 22, 2022, from https://cccc.ncte.org/cccc/resources/positions/writingassessment

  • Deane, P. (2013). On the relation between automated essay scoring and modern views of the writing construct. Assessing Writing, 18, 7–24.

    Article  Google Scholar 

  • Dujinhower, H., Prins, F. J., & Stokking, K. M. (2012). Feedback providing improvement strategies and reflection on feedback use: Effects on students’ writing motivation, process, and performance. Learning and Instruction, 22, 171–184.

    Article  Google Scholar 

  • Graham, S., Hebert, M., & Harris, K. R. (2015). Formative assessment and writing: A meta-analysis. Elementary School Journal, 115, 523–547.

    Article  Google Scholar 

  • Graesser, A. C., McNamara, D. S., & Louwerse, M. M. (2003). What do readers need to learn in order to process coherence relations in narrative and expository text? In A. P. Sweet & C. E. Snow (Eds.), Rethinking reading comprehension (pp. 82–98). Guilford.

    Google Scholar 

  • Graesser, A. C., McNamara, D. S., Louwerse, M. M., & Cai, Z. (2004). Coh-Metrix: Analysis of text on cohesion and language. Behavioral Research Methods, Instruments, and Computers, 36(2), 193–202. https://doi.org/10.3758/bf03195564

    Article  Google Scholar 

  • Higgins, D., & Heilman, M. (2014). Managing what we can measure: Quantifying the susceptibility of automated essay scoring systems to gaming behavior. Educational Measurement: Issues and Practice, 33(3), 36–46.

    Article  Google Scholar 

  • Huang, Y., & Wilson, J. (2021). Using automated feedback to develop writing proficiency. Computers and Composition, 62, 103675. https://doi.org/10.1016/j.compcom.2021.102675

    Article  Google Scholar 

  • Kane, M. T. (1992). An argument-based approach to validity. Psychological Bulletin, 112, 527–535.

    Article  Google Scholar 

  • Kellogg, R. T., Whiteford, A. P., & Quinlan, T. (2010). Does automated feedback help students learn to write? Journal of Educational Computing Research, 42(2), 173–196.

    Article  Google Scholar 

  • Kumar, V. S., & Boulanger, D. (2021). Automated essay scoring and the deep learning black box: How are rubric scores determined? International Journal of Artificial Intelligence in Education, 31, 538–584. https://doi.org/10.1007/s40593-020-00211-5

    Article  Google Scholar 

  • Liaqat, A., Munteanu, C., & Epp, C. D. (2021). Collaborating with mature English language learners to combine peer and automated feedback: A user-centered approach to designing writing support. International Journal of Artificial Intelligence in Education, 31, 638–679. https://doi.org/10.1007/s40593-020-00204-4

    Article  Google Scholar 

  • MacArthur, C. A., Jennings, A., & Philippakos, Z. A. (2019). Which linguistic features predict quality of argumentative writing for college basic writers, and how do those features change with instruction? Reading and Writing, 32, 1553–1574. https://doi.org/10.1007/s11145-018-9853-6

    Article  Google Scholar 

  • National Center for Education Statistics. (2012). The Nation’s Report Card: Writing 2011 (NCES 2012–470). Institute of Education Sciences, U.S. Department of Education.

    Google Scholar 

  • National Council of Teachers of English. (2013). NCTE position statement on machine scoring. Retrieved January 22, 2022, from http://www.ncte.org/positions/statements/machine_scoring

  • Northwest Regional Educational Laboratory. (2004). An introduction to the 6+1 trait writing assessment model. Author.

    Google Scholar 

  • Pajares, F. (2003). Self-efficacy beliefs, motivation, and achievement in writing: A review of the literature. Reading & Writing Quarterly, 19, 139–158.

    Article  Google Scholar 

  • Palermo, C., & Thomson, M. M. (2018). Teacher implementation of self-regulated strategy development with an automated writing evaluation system: Effects on the argumentative writing performance of middle school students. Contemporary Educational Psychology, 54, 255–270.

    Article  Google Scholar 

  • Palermo, C., & Wilson, J. (2020). Implementing automated writing evaluation in different instructional contexts: A mixed-methods study. Journal of Writing Research 12(1), 63–108. https://doi.org/10.17239/jowr-2020.12.01.04

  • Parekh S., Singla, Y. K., Chen, C., Li, J. J., & Shah, R. R. (2020). My teacher thinks the world is flat! Interpreting automatic essay scoring mechanism. Retrieved January 22, 2022, from http://arXiv.org/abs/2012.13872

  • Perelman, L. (2014). When the “state of the art” is counting words. Assessing Writing, 21, 104–111.

    Article  MathSciNet  Google Scholar 

  • Perelman, L. (2020). The BABEL generator and E-Rater: 21st century writing constructs and automated essay scoring (AES). The Journal of Writing Assessment, 13(1). Retrieved January 22, 2022, from http://journalofwritingassessment.org/article.php?article=145

  • Perfetti, C. A. (1998). The limits of co-occurrence: Tools and theories in language research. Discourse Processes, 25, 363–377.

    Article  Google Scholar 

  • Persky, H. R., Daane, M. C., & Jin, Y. (2002). The Nation’s report card: Writing 2002. (NCES 2003–529). National Center for Education Statistics, Institute of Education Sciences, U.S. Department of Education

  • Powers, D. E., Burstein, J. C., Chodorow, M., Fowles, M. E., & Kukich, K. (2002). Stumping e-rater: Challenging the validity of automated essay scoring. Computers in Human Behavior, 18, 103–134.

    Article  Google Scholar 

  • Raczynski, K., & Cohen, A. (2018). Appraising the scoring performance of automated essay scoring systems—Some additional considerations: Which essays? Which human raters? Which scores? Applied Measurement in Education, 31(3), 233–240.

    Article  Google Scholar 

  • Ramineni, C., & Williamson, D. M. (2013). Automated essay scoring: Psychometric guidelines and practices. Assessing Writing, 18, 25–39. https://doi.org/10.1016/j.asw.2012.10.004

    Article  Google Scholar 

  • Saal, F. E., Downey, R. G., & Lahey, M. A. (1980). Rating the ratings: Assessing the psychometric quality of rating data. Psychological Bulletin, 88, 413–428.

    Article  Google Scholar 

  • Shermis, M. D. (2014). The challenges of emulating human behavior in writing assessment. Assessing Writing, 22, 91–99.

    Article  Google Scholar 

  • Shermis, M. D., & Burstein, J. C. (Eds.). (2003). Automated essay scoring: A cross-disciplinary perspective. Erlbaum.

    Google Scholar 

  • Shermis, M. D., & Burstein, J. (Eds.). (2013). Handbook of automated essay evaluation: Current applications and new directions. Routledge.

    Google Scholar 

  • Shute, V. J. (2008). Focus on formative feedback. Review of Educational Research, 78, 153–189.

    Article  Google Scholar 

  • Stevenson, M. (2016). A critical interpretative synthesis: The integration of automated writing evaluation into classroom writing instruction. Computers and Composition, 42, 1–16. https://doi.org/10.1016/j.compcom.2016.05.001

    Article  Google Scholar 

  • Stevenson, M., & Phakiti, A. (2014). The effects of computer-generated feedback on the quality of writing. Assessing Writing, 19, 51–65.

    Article  Google Scholar 

  • Vajjala, S. (2018). Automated assessment of non-native learner essays: Investigating the role of linguistic features. International Journal of Artificial Intelligence in Education, 28, 79–105. https://doi.org/10.1007/s40593-017-0142-3

    Article  Google Scholar 

  • Wang, E. L., Matsumura, L. C., Correnti, R., Litman, D., Zhang, H., Howe, E., Magooda, A., & Quintana, R. (2020). eRevis(ing): Students’ revision of text evidence use in an automated writing evaluation system. Assessing Writing, 44, 100449. https://doi.org/10.1016/j.asw.2020.100449

    Article  Google Scholar 

  • Warschauer, M., & Grimes, D. (2008). Automated writing assessment in the classroom. Pedagogies, 3, 22–36.

    Article  Google Scholar 

  • Weigle, S. C. (1999). Investigating rater/prompt interactions in writing assessment: Quantitative and qualitative approaches. Assessing Writing, 6, 145–178.

    Article  Google Scholar 

  • Williamson, D. M., Xi, X., & Breyer, F. J. (2012). A framework for evaluation and use of automated scoring. Educational Measurement: Issues and Practice, 31, 2–13.

    Article  Google Scholar 

  • Wilson, J., Ahrendt, C., Fudge, E., Raiche, A., Beard, G., & MacArthur, C. A. (2021a). Elementary teachers’ perceptions of automated feedback and automated scoring: Transforming the teaching and learning of writing using automated writing evaluation. Computers & Education, 168, 104208. https://doi.org/10.1016/j.compedu.2021.104208

    Article  Google Scholar 

  • Wilson, J., Chen, D., Sandbank, M. P., & Hebert, M. (2019). Generalizability of automated scores of writing quality in grades 3–5. Journal of Educational Psychology, 111, 619–640. https://doi.org/10.1037/edu0000311

    Article  Google Scholar 

  • Wilson, J., & Czik, A. (2016). Automated essay evaluation software in English language arts classrooms: Effects on teacher feedback, student motivation, and writing quality. Computers and Education, 100, 94–109.

    Article  Google Scholar 

  • Wilson, J., Huang, Y., Palermo, C., Beard, G., & MacArthur, C. A. (2021b). Automated feedback and automated scoring in the elementary grades: Usage, attitudes, and associations with writing outcomes in a districtwide implementation of MI Write. International Journal of Artificial Intelligence in Education. https://doi.org/10.1007/s40593-020-00236-w

    Article  Google Scholar 

  • Wilson, J., Myers, M. C., & Potter, A. H. (2022). Investigating the promise of automated writing evaluation for supporting formative writing assessment at scale. Assessment in Education: Principles, Policy & Practice. https://doi.org/10.1080/0969594X.2022.2025762

    Article  Google Scholar 

  • Wilson, J., & Roscoe, R. D. (2020). Automated writing evaluation and feedback: Multiple metrics of efficacy. Journal of Educational Computing Research, 58, 87–125. https://doi.org/10.1177/0735633119830764

    Article  Google Scholar 

  • Wolfe, E. W. (2005). Uncovering raters’ cognitive processing and focus using think-aloud protocols. Journal of Writing Assessment, 2, 37–56.

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Matthew C. Myers.

Ethics declarations

This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors. The authors declare no conflicts of interest relevant to the content of this study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix A

figure j

Appendix B

Sample Persuasive Essay #53 – Nonrandomized (i.e., Control)

I believe that computers benefit our society. They teach hand–eye coordination. They also help us learn about faraway places and people. Lastly, you can talk to people. These are all things that benefit our society.

A good benefit in computers is that it teaches you hand–eye coordination. It helps you practice using your hands. That can benefit somebody because once you get alot of practice at it, you can type really fast. This will save you alot of time. Personally, I think that writing makes you get cramps in your hands and takes much longer to do rather than typing. It also benefits you because once you know where all the keys are and you can type without looking then you can multi-task. Once you get really good at typing you don't have to worry about having to look at the keys, you can type and do something else at the same time. This is how computers teach hand–eye coordination.

Another benefit about computers is that you can learn about far away places and people. A long time ago when I moved from New York, I had to leave all my closest friends. They were like family to me. I was really upset. Then a few years later when I was allowed to get a computer and a facebook, I found all of them. I was so joyful. Then we exchanged numbers and made plans to hang out. It was incredibly fun to see them again. I also have learned about far away places. There was an incident happening in Africa where this guy was taking kids as soldiers. I thought that we could make a difference by helping them out and stopping the bad man. Without computers alot of people wouldn't even know about the incident. This is how computers help bring friendships back together and help people in need.

Lastly, computers benefit the society because you can talk to people. If you have problems and you need to talk to somebody you can trust, then you could go on the computer and chat with them. Instead of walking to their house or calling them, you can just type to them. I think it is 10 times easier to just type especially after a long hard day of school and work. Another benefit would be that if you have an emergency you can quickly and easily type to somebody for help. They would get the messege quick and will be able to respond quickly. Calling them is annoying because what if they dont have their cellphone or maybe their phone died. It is just alot easier to type. Also, if you are ever bored you can just talk to them and maybe you'll have a good conversation. This is why I think that computers benefit the society.

This is why I think that computers benefit the society. They help your coordination. Which can possibly help you in a crisis. They will also help you learn about places and people that are far away. Which can bring back friendships and help people far away. And finally, you can talk to people on them. These are the three reasons why I think that computers benefit out society.

Total Score: 23.5

Idea Development

3.7

Word Choice

4.0

Organization

3.6

Sentence Fluency

4.0

Style

4.0

Conventions

4.2

Appendix C

Sample Persuasive Essay #53 – Randomized Iteration #17

A long time ago when I moved from New York, I had to leave all my closest friends. Which can possibly help you in a crisis. Lastly, computers benefit the society because you can talk to people. Calling them is annoying because what if they dont have their cellphone or maybe their phone died. They help your coordination.

This is why I think that computers benefit the society. Then a few years later when I was allowed to get a computer and a facebook, I found all of them. Personally, I think that writing makes you get cramps in your hands and takes much longer to do rather than typing. I thought that we could make a difference by helping them out and stopping the bad man. This is why I think that computers benefit the society. I think it is 10 times easier to just type especially after a long hard day of school and work. That can benefit somebody because once you get alot of practice at it, you can type really fast. Another benefit would be that if you have an emergency you can quickly and easily type to somebody for help.

I was really upset. There was an incident happening in Africa where this guy was taking kids as soldiers. It also benefits you because once you know where all the keys are and you can type without looking then you can multi-task. It was incredibly fun to see them again. It is just alot easier to type. Which can bring back friendships and help people far away. Then we exchanged numbers and made plans to hang out. Instead of walking to their house or calling them, you can just type to them. It helps you practice using your hands. This will save you alot of time. I also have learned about far away places. Lastly, you can talk to people. A good benefit in computers is that it teaches you hand–eye coordination.

I believe that computers benefit our society. These are the three reasons why I think that computers benefit out society. And finally, you can talk to people on them. If you have problems and you need to talk to somebody you can trust, then you could go on the computer and chat with them. They will also help you learn about places and people that are far away. They also help us learn about faraway places and people. Another benefit about computers is that you can learn about far away places and people. Without computers alot of people wouldn't even know about the incident. They were like family to me. They would get the messege quick and will be able to respond quickly.

These are all things that benefit our society. This is how computers help bring friendships back together and help people in need. Also, if you are ever bored you can just talk to them and maybe you'll have a good conversation. They teach hand–eye coordination. Once you get really good at typing you don't have to worry about having to look at the keys, you can type and do something else at the same time. I was so joyful. This is how computers teach hand–eye coordination.

Total Score: 23.7

Idea Development

3.7

Word Choice

4.1

Organization

3.6

Sentence Fluency

4.0

Style

4.0

Conventions

4.3

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Myers, M.C., Wilson, J. Evaluating the Construct Validity of an Automated Writing Evaluation System with a Randomization Algorithm. Int J Artif Intell Educ 33, 609–634 (2023). https://doi.org/10.1007/s40593-022-00301-6

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s40593-022-00301-6

Keywords

Navigation