research-article

Open Access

Gaining Insights into Course Difficulty Variations Using Item Response Theory

Authors:
Frederik Baucks

Institute of Neural Computation, Ruhr-Universität Bochum, Germany

Institute of Neural Computation, Ruhr-Universität Bochum, Germany

0009-0007-8921-2585
View Profile

,
Robin Schmucker

Machine Learning Department, Carnegie Mellon University, USA

Machine Learning Department, Carnegie Mellon University, USA

0000-0002-5275-3608
View Profile

,
Laurenz Wiskott

Institute of Neural Computation, Ruhr-Universität Bochum, Germany

Institute of Neural Computation, Ruhr-Universität Bochum, Germany

0000-0001-6237-740X
View Profile

LAK '24: Proceedings of the 14th Learning Analytics and Knowledge ConferenceMarch 2024Pages 450–461https://doi.org/10.1145/3636555.3636902

Published:18 March 2024Publication History

LAK '24: Proceedings of the 14th Learning Analytics and Knowledge Conference

Pages 450–461

ABSTRACT

Curriculum analytics (CA) studies curriculum structure and student data to ensure the quality of educational programs. To gain statistical robustness, most existing CA techniques rely on the assumption of time-invariant course difficulty, preventing them from capturing variations that might occur over time. However, ensuring low temporal variation in course difficulty is crucial to warrant fairness in treating individual student cohorts and consistency in degree outcomes. We introduce item response theory (IRT) as a CA methodology that enables us to address the open problem of monitoring course difficulty variations over time. We use statistical criteria to quantify the degree to which course performance data meets IRT’s theoretical assumptions and verify validity and reliability of IRT-based course difficulty estimates. Using data from 664 Computer Science and 1,355 Mechanical Engineering undergraduate students, we show how IRT can yield valuable CA insights: First, by revealing temporal variations in course difficulty over several years, we find that course difficulty has systematically shifted downward during the COVID-19 pandemic. Second, time-dependent course difficulty and cohort performance variations confound conventional course pass rate measures. We introduce IRT-adjusted pass rates as an alternative to account for these factors. Our findings affect policymakers, student advisors, accreditation, and course articulation.

References

Terry A Ackerman. 1994. Using multidimensional item response theory to understand what items and tests are measuring. Applied measurement in education 7, 4 (1994), 255–278.Google Scholar
Stephanie Ahlfeldt*, Sudhir Mehta, and Timothy Sellnow. 2005. Measurement and analysis of student engagement in university classes where varying levels of PBL methods of instruction are in use. Higher Education Research & Development 24, 1 (2005), 5–20.Google ScholarCross Ref
Silvia Bacci, Francesco Bartolucci, Leonardo Grilli, and Carla Rampichini. 2017. Evaluation of student performance through a multidimensional finite mixture IRT model. Multivariate Behavioral Research 52, 6 (2017), 732–746.Google ScholarCross Ref
Silvia Bacci and Michela Gnaldi. 2015. A classification of university courses based on students’ satisfaction: An application of a two-level mixture item response model. Quality & Quantity 49, 3 (2015), 927–940.Google ScholarCross Ref
Michael Backenköhler and Felix Scherzinger et al.2018. Data-Driven Approach towards a Personalized Curriculum. In Proceedings of the 11th International Conference on Educational Data Mining. International Educational Data Mining Society, Raleigh, NC, 246–251.Google Scholar
Frederik Baucks and Laurenz Wiskott. 2022. Simulating Policy Changes In Prerequisite-Free Curricula: A Supervised Data-Driven Approach. In Proceedings of the 15th International Conference on Educational Data Mining. Int. EDM Society, Durham, UK, 470–476.Google Scholar
Frederik Baucks and Laurenz Wiskott. 2023. Mitigating Biases using an Additive Grade Point Model: Towards Trustworthy Curriculum Analytics Measures. In Proceedings of the 21th Fachtagung Bildungstechnologien (DELFI). Gesellschaft fuer Informatik e.V., Aachen, Germany, 41–52.Google Scholar
Frederik Baucks and Laurenz Wiskott. 2024. Empowering Advisors: Designing a Dashboard for University Student Guidance. Springer VS, Wiesbaden, GER. In press.Google Scholar
Peter J Bickel and Kjell A Doksum. 2015. Mathematical statistics: basic ideas and selected topics, volumes I-II package. CRC, Boca Raton, USA.Google Scholar
Alejandro Bogarín, Rebeca Cerezo, and Cristóbal Romero. 2018. A survey on educational process mining. Wiley Interdisciplinary Reviews: Data Mining & Knowledge Discovery 8, 1 (2018), 12–30.Google ScholarCross Ref
Malcolm Brown, Mark McCormack, Jamie Reeves, D Christopher Brook, Susan Grajek, Bryan Alexander, Maha Bali, Stephanie Bulger, Shawna Dark, Nicole Engelbert, 2020. 2020 educause horizon report teaching and learning edition. Technical Report. Educause.Google Scholar
Philip Chalmers. 2012. mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software 48 (2012), 1–29.Google ScholarCross Ref
Karl Bang Christensen, Guido Makransky, and Mike Horton. 2017. Critical values for Yen’s Q 3: Identification of local dependence in the Rasch model using residual correlations. Applied psychological measurement 41, 3 (2017), 178–194.Google Scholar
Rafael Jaime De Ayala. 2013. The theory and practice of item response theory. Guilford, New York, NY, USA.Google Scholar
Nick Deschacht and Katie Goeman. 2015. The effect of blended learning on course persistence and performance of adult learners: A difference-in-differences analysis. Computers & Education 87 (2015), 83–89.Google ScholarDigital Library
Valentina Di Stasio. 2014. Education as a signal of trainability: Results from a vignette study with Italian employers. European Sociological Review 30, 6 (2014), 796–809.Google ScholarCross Ref
John Hansen, Philip Sadler, and Gerhard Sonnert. 2019. Estimating High School GPA Weighting Parameters With a Graded Response Model. Educational Measurement: Issues and Practice 38, 1 (2019), 16–24.Google ScholarCross Ref
Weijie Jiang, Zachary A. Pardos, and Qiang Wei. 2019. Goal-Based Course Recommendation. In Proceedings of the 9th International Conference on Learning Analytics & Knowledge (Tempe, AZ, USA) (LAK19). ACM, New York, NY, USA, 36–45.Google ScholarDigital Library
Julie Josse, François Husson, 2011. Multiple imputation in principal component analysis. Advances in data analysis and classification 5, 3 (2011), 231–246.Google Scholar
René F Kizilcec and Hansol Lee. 2022. Algorithmic fairness in education. Routledge, Abingdon, UK, 174–202.Google Scholar
Patrick Mair. 2018. Modern psychometrics with R. Springer, Cham, CH.Google Scholar
Gonzalo Mendez, Xavier Ochoa, Katherine Chiluiza, and Bram de Wever. 2014. Curricular Design Analysis: A Data-Driven Perspective. Journal of Learning Analytics 1, 3 (Nov. 2014), 84–119.Google ScholarCross Ref
Roland Molontay, Noémi Horváth, Júlia Bergmann, Dóra Szekrényes, and Mihály Szabó. 2020. Characterizing curriculum prerequisite networks by a student flow approach. IEEE Transactions on Learning Technologies 13, 3 (2020), 491–501.Google ScholarCross Ref
Zachary A. Pardos, Hung Chau, and Haocheng Zhao. 2019. Data-Assistive Course-to-Course Articulation Using Machine Translation. In Proceedings of the Sixth (2019) ACM Conference on Learning @ Scale (Chicago, IL, USA). Association for Computing Machinery, New York, NY, USA, 1–10.Google ScholarDigital Library
Fulya Baris Pekmezci and Asiye ŞENGÜL Avşar. 2021. A guide for more accurate and precise estimations in Simulative Unidimensional IRT Models. International Journal of Assessment Tools in Education 8, 2 (2021), 423–453.Google ScholarCross Ref
Alper Sahin and Duygu Anil. 2017. The Effects of Test Length and Sample Size on Item Parameters in Item Response Theory.Educational Sci.: Theory & Practice 17, 1 (2017), 321–335.Google Scholar
Ahmad Slim, Gregory L Heileman, Jarred Kozlick, and Chaouki T Abdallah. 2014. Employing markov networks on curriculum graphs to predict student performance. In 13th International Conference on Machine Learning & Applications. IEEE, IEEE, Detroit, MI, USA, 415–418.Google ScholarDigital Library
Daniel Spurk and Andrea E Abele. 2011. Who earns more and why? A multiple mediation model from personality to salary. Journal of Business and Psychology 26, 1 (2011), 87–103.Google ScholarCross Ref
Isabella Sulis, Mariano Porcu, and Vincenza Capursi. 2019. On the use of student evaluation of teaching: a longitudinal analysis combining measurement issues and implications of the exercise. Social Indicators Research 142, 3 (2019), 1305–1331.Google ScholarCross Ref
Isabella Sulis, Mariano Porcu, and Nicola Tedesco. 2011. Evaluating Lecturer’s Capability Over Time. Some Evidence from Surveys on University Course Quality. In New Perspectives in Statistical Modeling and Data Analysis. Springer Berlin Heidelberg, Berlin, Heidelberg, 13–20.Google Scholar
Michael L. Thomas. 2011. The Value of Item Response Theory in Clinical Assessment: A Review. Assessment 18, 3 (2011), 291–307.Google ScholarCross Ref
Nikola Trcka, Mykola Pechenizkiy, and Wil van der Aalst. 2010. Process mining from educational data. CRC, Boca Raton, USA, 123–142.Google Scholar
Suraj Uttamchandani and Joshua Quick. 2022. An introduction to fairness, absence of bias, and equity in learning analytics. Solar, NYC, USA, 205–212.Google Scholar
Wim J van der Linden and Ronald K Hambleton. 2013. Handbook of Modern Item Response Theory. Springer, New York, NY, USA.Google Scholar
Wai Yee Wong and Marcel Lavrencic. 2016. Using a Risk Management Approach in Analytics for Curriculum and Program Quality Improvement.. In PCLA@ LAK. SOLAR, Edinburgh, UK, 10–14.Google Scholar

Index Terms

Gaining Insights into Course Difficulty Variations Using Item Response Theory
1. Applied computing
  1. Education
    1. Learning management systems

Recommendations

What We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems’ Performance using Item Response Theory
RecSys '23: Proceedings of the 17th ACM Conference on Recommender Systems

Current practices in offline evaluation use rank-based metrics to measure the quality of top-n recommendation lists. This approach has practical benefits as it centres assessment on the output of the recommender system and, therefore, measures ...
Read More
Application of Item Response Theory to Collaborative Filtering
ISNN '09: Proceedings of the 6th International Symposium on Neural Networks on Advances in Neural Networks

Although many approaches to collaborative filtering have been proposed, few have considered the data quality of the recommender systems. Measurement is imprecise and the rating data given by users is true preference distorted. This paper describes how ...
Read More
Using item response theory to generate an item pool for an e-learning-system
LAK '17: Proceedings of the Seventh International Learning Analytics & Knowledge Conference

This paper¹ demonstrates how the application of item response theory yields useful item characteristics, which further can be the foundation of item pools and therefore adaptive educational software to come.

Read More

Comments

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Published in

LAK '24: Proceedings of the 14th Learning Analytics and Knowledge Conference
March 2024
962 pages
ISBN:9798400716188
DOI:10.1145/3636555

Copyright © 2024 Owner/Author
This work is licensed under a Creative Commons Attribution International 4.0 License.
Sponsors
In-Cooperation
Publisher
Association for Computing Machinery
New York, NY, United States
Publication History
- Published: 18 March 2024
Check for updates
Author Tags
curriculum analytics
fairness.
item response theory
Qualifiers
- research-article
- Research
- Refereed limited
Conference

Acceptance Rates
Overall Acceptance Rate236of782submissions,30%
Funding Sources
Other Metrics
View Article Metrics

Article Metrics
- 0
  Total Citations
  View Citations
- 45
  Total Downloads
- Downloads (Last 12 months)45
- Downloads (Last 6 weeks)45
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

PDF Format

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format .

View HTML Format

Gaining Insights into Course Difficulty Variations Using Item Response Theory

LAK '24: Proceedings of the 14th Learning Analytics and Knowledge Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

What We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems’ Performance using Item Response Theory

Application of Item Response Theory to Collaborative Filtering

Using item response theory to generate an item pool for an e-learning-system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

PDF Format

eReader

Digital Edition

HTML Format

Caption

Gaining Insights into Course Difficulty Variations Using Item Response Theory

LAK '24: Proceedings of the 14th Learning Analytics and Knowledge Conference

ABSTRACT

References

Cited By

Index Terms

Recommendations

What We Evaluate When We Evaluate Recommender Systems: Understanding Recommender Systems’ Performance using Item Response Theory

Application of Item Response Theory to Collaborative Filtering

Using item response theory to generate an item pool for an e-learning-system

Comments

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Check for updates

Author Tags

Qualifiers

Conference

Acceptance Rates

Funding Sources

Article Metrics

Other Metrics

PDF Format

eReader

Digital Edition

HTML Format

Share this Publication link

Share on Social Media