Abstract
Student attrition is one of the most frequently stated problems with massive open online courses. Although there is a growing body of research that investigates the factors leading to withdrawing from a course, there is also a need for data-driven solutions for early detection as a means to take remedial action. In this case study, we examine the use of several data preprocessing techniques to model attrition on the basis of students’ interactions with course materials and resources. Data for this study were obtained using the Open University Learning Analytics Dataset (OULAD), enabling the analysis of daily summaries of clickstream data. The data were segmented using a variable-sized overlapping window to take into account assignment submission dates as a context-sensitive factor for student attrition. In each sliding time window, features were extracted to characterize student interactions with curricular materials and converted to a set of linearly uncorrelated principal components. The analysis demonstrates that relatively accurate detection of the likelihood of students dropping out from a course can be attained within approximately 10 weeks from the course start date or before completion of assignments worth 20% of the final grade. Although a decision tree model outperformed alternative approaches to model student attrition, a log-linear model performed comparably well on the sparse representation of student interactions obtained through principal components analysis. We discuss the implications of these findings for early identification of at-risk students and the design of analytics dashboards for advisors and tutors.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Alshabandar, R., Hussain, A., Keight, R., Laws, A., & Baker, T. (2018). The application of Gaussian mixture models for the identification of at-risk learners in massive open online courses. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018 (pp. 1–8). IEEE.
Arnold, K. E., & Pistilli, M. D. (2012). Course signals at Purdue: Using learning analytics to increase student success. In Proceedings of the 2nd International Conference of Learning Analytics and Knowledge, LAK ‘12. New York: ACM.
Baker, R. (2010). Data mining for education. In B. McGaw, P. Peterson, & E. Baker (Eds.), International encyclopedia of education (3rd ed.). Oxford, UK: Elsevier.
Baker, R. S. (2019). Challenges for the future of educational data mining: The Baker learning analytics prizes. Journal of Educational Data Mining, 11(1), 1–17.
Banko, Z., Dobos, L., & Abonyi, J. (2011). Dynamic principal component analysis in multivariate time-series segmentation. Conservation, Information, Evolution-towards a sustainable engineering and economy, 1(1), 11–24.
Cambruzzi, W. L., Rigo, S. J., & Barbosa, J. L. (2015). Dropout prediction and reduction in distance education courses with the learning analytics multitrail approach. Journal of UCS, 21(1), 23–47.
Chuang, I., & Ho, A. D. (2016). HarvardX and MITx: Four years of open online courses – Fall 2012-summer 2016. Technical Reports. Harvard/MIT.
Cornillon, P. A., Imam, W., & Matzner-Løber, E. (2008). Forecasting time series using principal component analysis with respect to instrumental variables. Computational Statistics & Data Analysis, 52(3), 1269–1280.
Fritz, J. (2011, January 11). Learning analytics. Retrieved from https://sas.elluminate.com/site/external/jwsdetect/playback.jnlp?psid=2011-01-11.1101.M.340DDA914E66190DED68B759DCF9C3.vcr&sid=2008104.
Gardner, J., & Brooks, C. (2018a). Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 28(2), 127–203.
Gardner, J., & Brooks, C. (2018b). Dropout model evaluation in MOOCs. In Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), Association for the Advancement of Artificial Intelligence (AAAI) (pp 1–7).
Gütl, C., Rizzardini, R. H., Chang, V., & Morales, M. (2014, September). Attrition in MOOC: Lessons learned from drop-out students. In International workshop on learning technology for education in cloud (pp. 37–48). Cham, Switzerland: Springer.
Harrak, F., Bouchet, F., & Luengo, V. (2019). From student questions to student profiles in a blended learning environment. Journal of Learning Analytics, 6(1), 54–84. https://doi.org/10.18608/jla.2019.61.4
Heuer, H., & Breiter, A. (2018). Student success prediction and the trade-off between big data and data minimization. DeLFI 2018-Die 16. E-Learning Fachtagung Informatik.
Hlosta, M., Zdrahal, Z., & Zendulka, J. (2017, March). Ouroboros: early identification of at-risk students without models based on legacy data. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (pp. 6–15). ACM.
Hlosta, M., Zdrahal, Z., & Zendulka, J. (2018). Are we meeting a deadline? Classification goal achievement in time in the presence of imbalanced data. Knowledge-Based Systems, 160, 278–295.
Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. (2018). Student engagement predictions in an e-learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience, 2018, 6347186.
Hyndman, R. J., Wang, E., & Laptev, N. (2015, November). Large-scale unusual time series detection. In 2015 IEEE international conference on data mining workshop (ICDMW) (pp. 1616–1619). IEEE.
Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47.
Jordan, K. (2014). Initial trends in enrolment and completion of massive open online courses. The International Review of Research in Open and Distributed Learning, 15(1), 133–160.
Kuzilek, J., Hlosta, M., & Zdrahal, Z. (2017). Open university learning analytics dataset. Scientific Data, 4, 170171.
Kuzilek, J., Vaclavek, J., Fuglik, V., & Zdráhal, Z. (2018a). In V. Pammer-Schindler et al. (Eds.), Student drop-out modelling using virtual learning environment behaviour data. EC-TEL 2018 (pp. 166–171). Cham, Switzerland: Springer Nature.
Kuzilek, J., Vaclavek, J., Fuglik, V., & Zdráhal, Z. (2018b, September). Student drop-out modelling using virtual learning environment behaviour data. In European conference on technology enhanced learning (pp. 166–171). Cham, Switzerland: Springer.
Larrabee Sønderlund, A., Hughes, E., & Smith, J. (2019). The efficacy of learning analytics interventions in higher education: A systematic review. British Journal of Educational Technology, 50(5), 2594–2618.
Liu, H., Wang, Z., Benachour, P., & Tubman, P. (2018). A time series classification method for behaviour-based dropout prediction. In 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT) (pp. 191–195). IEEE.
Milliron, M. D., Malcolm, L., & Kil, D. (2014). Insight and action analytics: Three case studies to consider. Research & Practice in Assessment, 9, 70–89.
Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532
Romero, C., Ventura, S., & Garcia, E. (2008). Data mining in course management systems: Moodle case study and tutorial. Computers & Education, 51, 368–384.
Schumacher, C., & Ifenthaler, D. (2018). Features students really expect from learning analytics. Computers in Human Behavior, 78, 397–407.
Shah, D. (2018). By the numbers: MOOCs in 2017. https://www.class-central.com/report/mooc-stats-and-trends-2017/. Accessed 12 June 2019.
Wolff, A., Zdrahal, Z., Herrmannova, D., Kuzilek, J., & Hlosta, M. (2014). Developing predictive models for early detection of at-risk students on distance learning modules. In: Machine Learning and Learning Analytics Workshop at the 4th International Conference on Learning Analytics and Knowledge (LAK14), 24–28 March 2014, Indianapolis, Indiana, USA.
Zhu, M., Sari, A., & Lee, M. M. (2018). A systematic review of research methods and topics of the empirical MOOC literature (2014–2016). The Internet and Higher Education, 37, 31–39.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2020 Springer Nature Switzerland AG
About this chapter
Cite this chapter
Poitras, E.G., Behnagh, R.F., Bouchet, F. (2020). A Dimensionality Reduction Method for Time Series Analysis of Student Behavior to Predict Dropout in Massive Open Online Courses. In: Ifenthaler, D., Gibson, D. (eds) Adoption of Data Analytics in Higher Education Learning and Teaching. Advances in Analytics for Learning and Teaching. Springer, Cham. https://doi.org/10.1007/978-3-030-47392-1_20
Download citation
DOI: https://doi.org/10.1007/978-3-030-47392-1_20
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-47391-4
Online ISBN: 978-3-030-47392-1
eBook Packages: EducationEducation (R0)