ABSTRACT
Curriculum analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. To gain statistical robustness, most existing CA techniques rely on the assumption of time-invariant course difficulty, preventing them from capturing variations that might occur over time. However, ensuring low temporal variation in course difficulty is crucial to warrant fairness in treating individual student cohorts and consistency in degree outcomes. We introduce item response theory (IRT) as a CA methodology that enables us to address the open problem of monitoring course difficulty variations over time. We use statistical criteria to quantify the degree to which course performance data meets IRT’s theoretical assumptions and verify validity and reliability of IRT-based course difficulty estimates. Using data from 664 Computer Science and 1,355 Mechanical Engineering undergraduate students, we show how IRT can yield valuable CA insights: First, by revealing temporal variations in course difficulty over several years, we find that course difficulty has systematically shifted downward during the COVID-19 pandemic. Second, time-dependent course difficulty and cohort performance variations confound conventional course pass rate measures. We introduce IRT-adjusted pass rates as an alternative to account for these factors. Our findings affect policymakers, student advisors, accreditation, and course articulation.
- Terry A Ackerman. 1994. Using multidimensional item response theory to understand what items and tests are measuring. Applied measurement in education 7, 4 (1994), 255–278.Google Scholar
- Stephanie Ahlfeldt*, Sudhir Mehta, and Timothy Sellnow. 2005. Measurement and analysis of student engagement in university classes where varying levels of PBL methods of instruction are in use. Higher Education Research & Development 24, 1 (2005), 5–20.Google ScholarCross Ref
- Silvia Bacci, Francesco Bartolucci, Leonardo Grilli, and Carla Rampichini. 2017. Evaluation of student performance through a multidimensional finite mixture IRT model. Multivariate Behavioral Research 52, 6 (2017), 732–746.Google ScholarCross Ref
- Silvia Bacci and Michela Gnaldi. 2015. A classification of university courses based on students’ satisfaction: An application of a two-level mixture item response model. Quality & Quantity 49, 3 (2015), 927–940.Google ScholarCross Ref
- Michael Backenköhler and Felix Scherzinger et al.2018. Data-Driven Approach towards a Personalized Curriculum. In Proceedings of the 11th International Conference on Educational Data Mining. International Educational Data Mining Society, Raleigh, NC, 246–251.Google Scholar
- Frederik Baucks and Laurenz Wiskott. 2022. Simulating Policy Changes In Prerequisite-Free Curricula: A Supervised Data-Driven Approach. In Proceedings of the 15th International Conference on Educational Data Mining. Int. EDM Society, Durham, UK, 470–476.Google Scholar
- Frederik Baucks and Laurenz Wiskott. 2023. Mitigating Biases using an Additive Grade Point Model: Towards Trustworthy Curriculum Analytics Measures. In Proceedings of the 21th Fachtagung Bildungstechnologien (DELFI). Gesellschaft fuer Informatik e.V., Aachen, Germany, 41–52.Google Scholar
- Frederik Baucks and Laurenz Wiskott. 2024. Empowering Advisors: Designing a Dashboard for University Student Guidance. Springer VS, Wiesbaden, GER. In press.Google Scholar
- Peter J Bickel and Kjell A Doksum. 2015. Mathematical statistics: basic ideas and selected topics, volumes I-II package. CRC, Boca Raton, USA.Google Scholar
- Alejandro Bogarín, Rebeca Cerezo, and Cristóbal Romero. 2018. A survey on educational process mining. Wiley Interdisciplinary Reviews: Data Mining & Knowledge Discovery 8, 1 (2018), 12–30.Google ScholarCross Ref
- Malcolm Brown, Mark McCormack, Jamie Reeves, D Christopher Brook, Susan Grajek, Bryan Alexander, Maha Bali, Stephanie Bulger, Shawna Dark, Nicole Engelbert, 2020. 2020 educause horizon report teaching and learning edition. Technical Report. Educause.Google Scholar
- Philip Chalmers. 2012. mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software 48 (2012), 1–29.Google ScholarCross Ref
- Karl Bang Christensen, Guido Makransky, and Mike Horton. 2017. Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Applied psychological measurement 41, 3 (2017), 178–194.Google Scholar
- Rafael Jaime De Ayala. 2013. The theory and practice of item response theory. Guilford, New York, NY, USA.Google Scholar
- Nick Deschacht and Katie Goeman. 2015. The effect of blended learning on course persistence and performance of adult learners: A difference-in-differences analysis. Computers & Education 87 (2015), 83–89.Google ScholarDigital Library
- Valentina Di Stasio. 2014. Education as a signal of trainability: Results from a vignette study with Italian employers. European Sociological Review 30, 6 (2014), 796–809.Google ScholarCross Ref
- John Hansen, Philip Sadler, and Gerhard Sonnert. 2019. Estimating High School GPA Weighting Parameters With a Graded Response Model. Educational Measurement: Issues and Practice 38, 1 (2019), 16–24.Google ScholarCross Ref
- Weijie Jiang, Zachary A. Pardos, and Qiang Wei. 2019. Goal-Based Course Recommendation. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge (Tempe, AZ, USA) (LAK19). ACM, New York, NY, USA, 36–45.Google ScholarDigital Library
- Julie Josse, François Husson, 2011. Multiple imputation in principal component analysis. Advances in data analysis and classification 5, 3 (2011), 231–246.Google Scholar
- René F Kizilcec and Hansol Lee. 2022. Algorithmic fairness in education. Routledge, Abingdon, UK, 174–202.Google Scholar
- Patrick Mair. 2018. Modern psychometrics with R. Springer, Cham, CH.Google Scholar
- Gonzalo Mendez, Xavier Ochoa, Katherine Chiluiza, and Bram de Wever. 2014. Curricular Design Analysis: A Data-Driven Perspective. Journal of Learning Analytics 1, 3 (Nov. 2014), 84–119.Google ScholarCross Ref
- Roland Molontay, Noémi Horváth, Júlia Bergmann, Dóra Szekrényes, and Mihály Szabó. 2020. Characterizing curriculum prerequisite networks by a student flow approach. IEEE Transactions on Learning Technologies 13, 3 (2020), 491–501.Google ScholarCross Ref
- Zachary A. Pardos, Hung Chau, and Haocheng Zhao. 2019. Data-Assistive Course-to-Course Articulation Using Machine Translation. In Proceedings of the Sixth (2019) ACM Conference on Learning @ Scale (Chicago, IL, USA). Association for Computing Machinery, New York, NY, USA, 1–10.Google ScholarDigital Library
- Fulya Baris Pekmezci and Asiye ŞENGÜL Avşar. 2021. A guide for more accurate and precise estimations in Simulative Unidimensional IRT Models. International Journal of Assessment Tools in Education 8, 2 (2021), 423–453.Google ScholarCross Ref
- Alper Sahin and Duygu Anil. 2017. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory.Educational Sci.: Theory & Practice 17, 1 (2017), 321–335.Google Scholar
- Ahmad Slim, Gregory L Heileman, Jarred Kozlick, and Chaouki T Abdallah. 2014. Employing markov networks on curriculum graphs to predict student performance. In 13th International Conference on Machine Learning & Applications. IEEE, IEEE, Detroit, MI, USA, 415–418.Google ScholarDigital Library
- Daniel Spurk and Andrea E Abele. 2011. Who earns more and why? A multiple mediation model from personality to salary. Journal of Business and Psychology 26, 1 (2011), 87–103.Google ScholarCross Ref
- Isabella Sulis, Mariano Porcu, and Vincenza Capursi. 2019. On the use of student evaluation of teaching: a longitudinal analysis combining measurement issues and implications of the exercise. Social Indicators Research 142, 3 (2019), 1305–1331.Google ScholarCross Ref
- Isabella Sulis, Mariano Porcu, and Nicola Tedesco. 2011. Evaluating Lecturer’s Capability Over Time. Some Evidence from Surveys on University Course Quality. In New Perspectives in Statistical Modeling and Data Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 13–20.Google Scholar
- Michael L. Thomas. 2011. The Value of Item Response Theory in Clinical Assessment: A Review. Assessment 18, 3 (2011), 291–307.Google ScholarCross Ref
- Nikola Trcka, Mykola Pechenizkiy, and Wil van der Aalst. 2010. Process mining from educational data. CRC, Boca Raton, USA, 123–142.Google Scholar
- Suraj Uttamchandani and Joshua Quick. 2022. An introduction to fairness, absence of bias, and equity in learning analytics. Solar, NYC, USA, 205–212.Google Scholar
- Wim J van der Linden and Ronald K Hambleton. 2013. Handbook of Modern Item Response Theory. Springer, New York, NY, USA.Google Scholar
- Wai Yee Wong and Marcel Lavrencic. 2016. Using a Risk Management Approach in Analytics for Curriculum and Program Quality Improvement.. In PCLA@ LAK. SOLAR, Edinburgh, UK, 10–14.Google Scholar
Index Terms
- Gaining Insights into Course Difficulty Variations Using Item Response Theory
Recommendations
What We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems’ Performance using Item Response Theory
RecSys '23: Proceedings of the 17th ACM Conference on Recommender SystemsCurrent practices in offline evaluation use rank-based metrics to measure the quality of top-n recommendation lists. This approach has practical benefits as it centres assessment on the output of the recommender system and, therefore, measures ...
Application of Item Response Theory to Collaborative Filtering
ISNN '09: Proceedings of the 6th International Symposium on Neural Networks on Advances in Neural NetworksAlthough many approaches to collaborative filtering have been proposed, few have considered the data quality of the recommender systems. Measurement is imprecise and the rating data given by users is true preference distorted. This paper describes how ...
Using item response theory to generate an item pool for an e-learning-system
LAK '17: Proceedings of the Seventh International Learning Analytics & Knowledge ConferenceThis paper1 demonstrates how the application of item response theory yields useful item characteristics, which further can be the foundation of item pools and therefore adaptive educational software to come.
Comments