Skip to main content
Log in

Dropout prediction model in MOOC based on clickstream data and student sample weight

  • Methodologies and Application
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

Currently, the high dropout rate of massive open online course (MOOC) has seriously affected its popularity and promotion. How to effectively predict the dropout status of students in MOOC so as to intervene as early as possible has become a hot topic. As we know, different students in MOOC have big differences in learning behaviors, learning habits, and learning time, etc. This leads to different student samples having different effects on the prediction performance of the machine learning-based dropout prediction model (DPM). This is because the performance of machine learning-based classifiers heavily depends on the quality of training samples. To solve this problem, in this paper, a new DPM based on machine learning is proposed. Since the traditional neighborhood concept has nothing to do with the label of the sample, a new neighborhood definition, i.e., the max neighborhood, is first given. It is not only related to the distance between samples, but also related to the labels of the samples. Then, the calculation and realization algorithm of the initial weight of each student sample is studied based on the definition of the max neighborhood, which is different from the commonly methods of randomly selecting initial values. Next, the optimization method of the initial weight of the student sample is further studied using the intelligent optimization method. Finally, the classifiers trained by the weighted training samples are used as DPM. Experimental results of direct observation and statistical testing on public data sets indicate that the training sample weighting and intelligent optimization technology can significantly improve the predictive performance of DPM.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

References

  • Adolfo JUZ, Gregorio T, Alfonso M (2021) Variable neighborhood search to solve the generalized orienteering problem. Int Trans Oper Res 28(1):142–167

    Article  MathSciNet  Google Scholar 

  • Arcuri A, Briand LA (2011) practical guide for using statistical tests to assess randomized algorithms in software engineering. 33rd International Conference on Software Engineering, 21–28 Honolulu, USA, 1–10

  • Al-Shabandar R, Hussain A, Laws A, et al. (2017) Machine learning approaches to predict learning outcomes in Massive open online courses. International Joint Conference on Neural Networks, 14–19 Anchorage, Alaska, USA, 713–720

  • Bradley AP (2013) ROC curve equivalence using the Kolmogorov-Smirnov test. Pattern Recogn Lett 34(5):470–475

    Article  Google Scholar 

  • Chai Y, Lei C U, Hu X, et al (2018) WPSS: dropout prediction for MOOCs using course progress normalization and subset selection. The Fifth Annual ACM Conference on Learning at Scale, 26–28 London, UK, 29–34

  • Chaplot D S., Rhim E, Kim J. Predicting student attrition in MOOCs using sentiment analysis and neural networks. The 17th International Conference on Artificial Intelligence in Education, 22–26 June 2015, Madrid, Spain, 7–12

  • Chen J, Feng J, Sun X et al (2019) MOOC dropout prediction using a hybrid algorithm based on decision tree and extreme learning machine. Math Probl Eng. https://doi.org/10.1155/2019/8404653

    Article  Google Scholar 

  • Dalipi F, Imran A S, Kastrati Z (2018) MOOC dropout prediction using machine learning techniques: review and research challenges. Global Engineering Education Conference, 17–20 Santa Cruz de Tenerife, Spain, 1007–1014

  • Fei M, Yeung DY (2015) Temporal models for predicting student dropout in massive open online courses. 2015 IEEE International Conference on Data Mining Workshop, 14–17, Atlantic City, USA, 256–263

  • Fenghua WEN, Jihong X, Zhifang HE et al (2014) Stock price prediction based on SSA and SVM. Procedia Comput Sci 31:625–631

    Article  Google Scholar 

  • Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313

    Article  Google Scholar 

  • Halawa S, Daniel G, John M. Dropout prediction in MOOCs using learner activity features. The Second European MOOC Stakeholder Summit, 10–12 February 2014, Laussanne, Switzerland, 3–12

  • Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750

    Article  Google Scholar 

  • Hasbun T, Araya A, Villalon J (2016) Extracurricular activities as dropout prediction factors in higher education using decision trees. 2016 IEEE 16th International Conference on Advanced Learning Technologies, 25–, Austin, TX, USA, 242–244

  • Heidrich L, Barbosa JLV, Cambruzzi W et al (2018) Diagnosis of learner dropout based on learning styles for online distance learning. Telematics Inform 35(6):1593–1606

    Article  Google Scholar 

  • Hernández-Orallo J, Flach P, Ferri C (2013) ROC curves in cost space. Mach Learn 93(1):71–91

    Article  MathSciNet  Google Scholar 

  • Hung JL, Wang MC, Wang C et al (2017) Identifying at-risk students for early interventions-a time-series clustering approach. IEEE Trans Emerg Top Comput 5(1):45–55

    Article  Google Scholar 

  • Iam-On N, Boongoen T (2017) Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. Int J Mach Learn Cybern 8(2):497–510

    Article  Google Scholar 

  • Jin C (2020) MOOC student dropout prediction model based on learning behavior features and parameter optimization. Interact Learn Environ. https://doi.org/10.1080/10494820.2020.1802300

    Article  Google Scholar 

  • Jin C, Dong EM (2015) Software defect prediction using fuzzy integral and genetic algorithm. International Conference on Software Engineering and Information Technology, 26–28 Guilin, China, 334–340

  • Jin C, Jin P (2009) Fingerprint classification in DCT domain using RBF neural networks. J Inf Sci Eng 25(6):1955–1962

    Google Scholar 

  • Jin C, Jin SW (2015) Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Appl Soft Comput 35:717–725

    Article  Google Scholar 

  • Jin C, Jin SW (2016) A multi-label image annotation scheme based on improved SVM multiple kernel learning. The Eighth International Conference on Graphic and Image Processing, International Society for Optics and Photonics, 29–31, Tokyo, Japan, Vol.10225, 1022510

  • Jin C, Liu JA (2016) An experimental assessment of hybrid genetic-simulated annealing algorithm. The 13th International Symposium on Neural Networks, LNCS 9719, 6–8 Saint Petersburg, Russia, 595–602

  • Jordan K (2014) Initial trends in enrolment and completion of massive open online courses. Int Rev Res Open Distrib Learn 15(1):133–160

    Google Scholar 

  • Leandro DSC (2008) A quantum particle swarm optimizer with chaotic mutation operator. Chaos, Solitons Fractals 37(5):1409–1418

    Article  Google Scholar 

  • Li W, Gao M, Li H, et al (2016) Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning. International Joint Conference on Neural Networks, 24–29, Vancouver, Canada, 3130–3137

  • Liang J, Li C, Zheng L (2016) Machine learning application in MOOCs: dropout prediction. The 11th International Conference on Computer Science & Education, 23–25, Nagoya, Japan, 52–57

  • Liang J, Yang J, Wu Y, et al (2016) Big data application in education: dropout prediction in edx MOOCs. 2016 IEEE Second International Conference on Multimedia Big Data, 20–22 Taipei, Taiwan, 440–443

  • Liese F, Miescke KJ (2008) Statistical decision theory: estimation, testing, and selection. Springer, Berlin

    MATH  Google Scholar 

  • Lim WH, Isa NAM (2014) Teaching and peer-learning particle swarm optimization. Appl Soft Comput 18:39–58

    Article  Google Scholar 

  • Liu HY, Wang ZH, Benachour P, et al. (2018) A time series classification method for behaviour-based dropout prediction. The 18th International Conference on Advanced Learning Technologies, 9–13 Mumbai, India, 191–195

  • Mirjalili S (2019) Genetic algorithm. Evolutionary algorithms and neural networks. Springer, Cham, pp 43–45

    Book  Google Scholar 

  • Mohamed AM (2020) A novel hybrid particle swarm optimization and gravitational search algorithm for multi-objective optimization of text mining. Appl Soft Comput 90:106189

    Article  Google Scholar 

  • Olsson AE (2010) Particle swarm optimization: theory, techniques and applications. Nova Science Publishers, Inc.,

  • Pedro MMM, Pedro JMM, Jorge MM et al (2020) Temporal analysis for dropout prediction using self-regulated learning strategies in self-paced MOOCs. Comput Educ 145:103728. https://doi.org/10.1016/j.compedu.2019.103728

    Article  Google Scholar 

  • Prenkaj B, Stilo G, Lorenzo M (2020) Challenges and solutions to the student dropout prediction problem in online courses. The 29th ACM International Conference on Information and Knowledge Management (CIKM'20), 19–23, 3513–3514

  • Qiu L, Liu Y, Hu Q et al (2019) Student dropout prediction in massive open online courses by convolutional neural networks. Soft Comput 23(20):10287–10301

    Article  Google Scholar 

  • Rebecca M S, Gloria A (2014) Mass attrition: An analysis of drop out from a principles of microeconomics MOOC. Social Science Research Networks, 1–19

  • Roshankumar RM (2020) Enhance clustering algorithm using optimization. Int J Res Eng, Sci Manage 3(9):136–142

    Google Scholar 

  • Şahin M (2020) A comparative analysis of dropout prediction in massive open online courses. Arab J Sci Eng. https://doi.org/10.1007/s13369-020-05127-9

    Article  Google Scholar 

  • Senthil KN, Atilla E (2020) An effective prediction model for online course dropout rate. Int J Distance Edu Technol 18(4):94–110

    Article  Google Scholar 

  • Sun J, Wu X, Palade V et al (2012) Convergence analysis and improvements of quantum-behaved particle swarm optimization. Inf Sci 193:81–103

    Article  MathSciNet  Google Scholar 

  • Tang J, Tian Y, Zhang P et al (2018) Multiview privileged support vector machines. IEEE Trans Neural Netw Learn Syst 29(8):3463–3477

    Article  MathSciNet  Google Scholar 

  • Tang C, Ouyang Y, Rong W, et al (2018) Time series model for predicting dropout in massive open online courses. The 19th International Conference on Artificial Intelligence in Education, LNCS 10948, 27–30 London, UK, 353–357

  • Vargha A, Delaney HD (2000) A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Edu Behav Stat 25:101–132

    Google Scholar 

  • Wang W, Yu H, Miao C (2017) Deep model for dropout prediction in MOOCs. The 2nd International Conference on Crowd Science and Engineering, 6–9, Beijing, China, 26–32

  • Xing W, Du D (2018) Dropout prediction in MOOCs: using deep learning for personalized intervention. J Edu Comput Res 57(3):547–570

    Article  Google Scholar 

  • Xing WL, Chen X, Stein J, Marcinkowski M (2016) Temporal predication of dropouts in MOOCs: reaching the low hanging fruit through stacking generalization. Comput Hum Behav 58:119–129

    Article  Google Scholar 

  • Zabarankin M, Uryasev S (2016) Statistical decision problems. Springer-Verlag, New York

    MATH  Google Scholar 

  • Zhang Y, Lu S, Zhou X et al (2016) Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine. SIMULATION 92(9):861–871

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Cong Jin.

Ethics declarations

Conflict of interest

The author declares that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Jin, C. Dropout prediction model in MOOC based on clickstream data and student sample weight. Soft Comput 25, 8971–8988 (2021). https://doi.org/10.1007/s00500-021-05795-1

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-021-05795-1

Keywords

Navigation