Skip to main content

A Dimensionality Reduction Method for Time Series Analysis of Student Behavior to Predict Dropout in Massive Open Online Courses

  • Chapter
  • First Online:
Adoption of Data Analytics in Higher Education Learning and Teaching

Abstract

Student attrition is one of the most frequently stated problems with massive open online courses. Although there is a growing body of research that investigates the factors leading to withdrawing from a course, there is also a need for data-driven solutions for early detection as a means to take remedial action. In this case study, we examine the use of several data preprocessing techniques to model attrition on the basis of students’ interactions with course materials and resources. Data for this study were obtained using the Open University Learning Analytics Dataset (OULAD), enabling the analysis of daily summaries of clickstream data. The data were segmented using a variable-sized overlapping window to take into account assignment submission dates as a context-sensitive factor for student attrition. In each sliding time window, features were extracted to characterize student interactions with curricular materials and converted to a set of linearly uncorrelated principal components. The analysis demonstrates that relatively accurate detection of the likelihood of students dropping out from a course can be attained within approximately 10 weeks from the course start date or before completion of assignments worth 20% of the final grade. Although a decision tree model outperformed alternative approaches to model student attrition, a log-linear model performed comparably well on the sparse representation of student interactions obtained through principal components analysis. We discuss the implications of these findings for early identification of at-risk students and the design of analytics dashboards for advisors and tutors.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 109.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 139.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD 199.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  • Alshabandar, R., Hussain, A., Keight, R., Laws, A., & Baker, T. (2018). The application of Gaussian mixture models for the identification of at-risk learners in massive open online courses. In Proceedings of the 2018 IEEE Congress on Evolutionary Computation (CEC), Rio de Janeiro, Brazil, 8–13 July 2018 (pp. 1–8). IEEE.

    Google Scholar 

  • Arnold, K. E., & Pistilli, M. D. (2012). Course signals at Purdue: Using learning analytics to increase student success. In Proceedings of the 2nd International Conference of Learning Analytics and Knowledge, LAK ‘12. New York: ACM.

    Google Scholar 

  • Baker, R. (2010). Data mining for education. In B. McGaw, P. Peterson, & E. Baker (Eds.), International encyclopedia of education (3rd ed.). Oxford, UK: Elsevier.

    Google Scholar 

  • Baker, R. S. (2019). Challenges for the future of educational data mining: The Baker learning analytics prizes. Journal of Educational Data Mining, 11(1), 1–17.

    Google Scholar 

  • Banko, Z., Dobos, L., & Abonyi, J. (2011). Dynamic principal component analysis in multivariate time-series segmentation. Conservation, Information, Evolution-towards a sustainable engineering and economy, 1(1), 11–24.

    Google Scholar 

  • Cambruzzi, W. L., Rigo, S. J., & Barbosa, J. L. (2015). Dropout prediction and reduction in distance education courses with the learning analytics multitrail approach. Journal of UCS, 21(1), 23–47.

    Google Scholar 

  • Chuang, I., & Ho, A. D. (2016). HarvardX and MITx: Four years of open online courses – Fall 2012-summer 2016. Technical Reports. Harvard/MIT.

    Google Scholar 

  • Cornillon, P. A., Imam, W., & Matzner-Løber, E. (2008). Forecasting time series using principal component analysis with respect to instrumental variables. Computational Statistics & Data Analysis, 52(3), 1269–1280.

    Article  Google Scholar 

  • Fritz, J. (2011, January 11). Learning analytics. Retrieved from https://sas.elluminate.com/site/external/jwsdetect/playback.jnlp?psid=2011-01-11.1101.M.340DDA914E66190DED68B759DCF9C3.vcr&sid=2008104.

  • Gardner, J., & Brooks, C. (2018a). Student success prediction in MOOCs. User Modeling and User-Adapted Interaction, 28(2), 127–203.

    Article  Google Scholar 

  • Gardner, J., & Brooks, C. (2018b). Dropout model evaluation in MOOCs. In Eighth AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), Association for the Advancement of Artificial Intelligence (AAAI) (pp 1–7).

    Google Scholar 

  • Gütl, C., Rizzardini, R. H., Chang, V., & Morales, M. (2014, September). Attrition in MOOC: Lessons learned from drop-out students. In International workshop on learning technology for education in cloud (pp. 37–48). Cham, Switzerland: Springer.

    Google Scholar 

  • Harrak, F., Bouchet, F., & Luengo, V. (2019). From student questions to student profiles in a blended learning environment. Journal of Learning Analytics, 6(1), 54–84. https://doi.org/10.18608/jla.2019.61.4

    Article  Google Scholar 

  • Heuer, H., & Breiter, A. (2018). Student success prediction and the trade-off between big data and data minimization. DeLFI 2018-Die 16. E-Learning Fachtagung Informatik.

    Google Scholar 

  • Hlosta, M., Zdrahal, Z., & Zendulka, J. (2017, March). Ouroboros: early identification of at-risk students without models based on legacy data. In Proceedings of the Seventh International Learning Analytics & Knowledge Conference (pp. 6–15). ACM.

    Google Scholar 

  • Hlosta, M., Zdrahal, Z., & Zendulka, J. (2018). Are we meeting a deadline? Classification goal achievement in time in the presence of imbalanced data. Knowledge-Based Systems, 160, 278–295.

    Article  Google Scholar 

  • Hussain, M., Zhu, W., Zhang, W., & Abidi, S. M. R. (2018). Student engagement predictions in an e-learning system and their impact on student course assessment scores. Computational Intelligence and Neuroscience, 2018, 6347186.

    Article  Google Scholar 

  • Hyndman, R. J., Wang, E., & Laptev, N. (2015, November). Large-scale unusual time series detection. In 2015 IEEE international conference on data mining workshop (ICDMW) (pp. 1616–1619). IEEE.

    Google Scholar 

  • Jayaprakash, S. M., Moody, E. W., Lauría, E. J., Regan, J. R., & Baron, J. D. (2014). Early alert of academically at-risk students: An open source analytics initiative. Journal of Learning Analytics, 1(1), 6–47.

    Article  Google Scholar 

  • Jordan, K. (2014). Initial trends in enrolment and completion of massive open online courses. The International Review of Research in Open and Distributed Learning, 15(1), 133–160.

    Article  Google Scholar 

  • Kuzilek, J., Hlosta, M., & Zdrahal, Z. (2017). Open university learning analytics dataset. Scientific Data, 4, 170171.

    Article  Google Scholar 

  • Kuzilek, J., Vaclavek, J., Fuglik, V., & Zdráhal, Z. (2018a). In V. Pammer-Schindler et al. (Eds.), Student drop-out modelling using virtual learning environment behaviour data. EC-TEL 2018 (pp. 166–171). Cham, Switzerland: Springer Nature.

    Google Scholar 

  • Kuzilek, J., Vaclavek, J., Fuglik, V., & Zdráhal, Z. (2018b, September). Student drop-out modelling using virtual learning environment behaviour data. In European conference on technology enhanced learning (pp. 166–171). Cham, Switzerland: Springer.

    Chapter  Google Scholar 

  • Larrabee Sønderlund, A., Hughes, E., & Smith, J. (2019). The efficacy of learning analytics interventions in higher education: A systematic review. British Journal of Educational Technology, 50(5), 2594–2618.

    Article  Google Scholar 

  • Liu, H., Wang, Z., Benachour, P., & Tubman, P. (2018). A time series classification method for behaviour-based dropout prediction. In 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT) (pp. 191–195). IEEE.

    Google Scholar 

  • Milliron, M. D., Malcolm, L., & Kil, D. (2014). Insight and action analytics: Three case studies to consider. Research & Practice in Assessment, 9, 70–89.

    Google Scholar 

  • Romero, C., & Ventura, S. (2010). Educational data mining: A review of the state of the art. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 40(6), 601–618. https://doi.org/10.1109/TSMCC.2010.2053532

    Article  Google Scholar 

  • Romero, C., Ventura, S., & Garcia, E. (2008). Data mining in course management systems: Moodle case study and tutorial. Computers & Education, 51, 368–384.

    Article  Google Scholar 

  • Schumacher, C., & Ifenthaler, D. (2018). Features students really expect from learning analytics. Computers in Human Behavior, 78, 397–407.

    Article  Google Scholar 

  • Shah, D. (2018). By the numbers: MOOCs in 2017. https://www.class-central.com/report/mooc-stats-and-trends-2017/. Accessed 12 June 2019.

  • Wolff, A., Zdrahal, Z., Herrmannova, D., Kuzilek, J., & Hlosta, M. (2014). Developing predictive models for early detection of at-risk students on distance learning modules. In: Machine Learning and Learning Analytics Workshop at the 4th International Conference on Learning Analytics and Knowledge (LAK14), 24–28 March 2014, Indianapolis, Indiana, USA.

    Google Scholar 

  • Zhu, M., Sari, A., & Lee, M. M. (2018). A systematic review of research methods and topics of the empirical MOOC literature (2014–2016). The Internet and Higher Education, 37, 31–39.

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Eric G. Poitras .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2020 Springer Nature Switzerland AG

About this chapter

Check for updates. Verify currency and authenticity via CrossMark

Cite this chapter

Poitras, E.G., Behnagh, R.F., Bouchet, F. (2020). A Dimensionality Reduction Method for Time Series Analysis of Student Behavior to Predict Dropout in Massive Open Online Courses. In: Ifenthaler, D., Gibson, D. (eds) Adoption of Data Analytics in Higher Education Learning and Teaching. Advances in Analytics for Learning and Teaching. Springer, Cham. https://doi.org/10.1007/978-3-030-47392-1_20

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-47392-1_20

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-47391-4

  • Online ISBN: 978-3-030-47392-1

  • eBook Packages: EducationEducation (R0)

Publish with us

Policies and ethics