Abstract
Currently, the high dropout rate of massive open online course (MOOC) has seriously affected its popularity and promotion. How to effectively predict the dropout status of students in MOOC so as to intervene as early as possible has become a hot topic. As we know, different students in MOOC have big differences in learning behaviors, learning habits, and learning time, etc. This leads to different student samples having different effects on the prediction performance of the machine learning-based dropout prediction model (DPM). This is because the performance of machine learning-based classifiers heavily depends on the quality of training samples. To solve this problem, in this paper, a new DPM based on machine learning is proposed. Since the traditional neighborhood concept has nothing to do with the label of the sample, a new neighborhood definition, i.e., the max neighborhood, is first given. It is not only related to the distance between samples, but also related to the labels of the samples. Then, the calculation and realization algorithm of the initial weight of each student sample is studied based on the definition of the max neighborhood, which is different from the commonly methods of randomly selecting initial values. Next, the optimization method of the initial weight of the student sample is further studied using the intelligent optimization method. Finally, the classifiers trained by the weighted training samples are used as DPM. Experimental results of direct observation and statistical testing on public data sets indicate that the training sample weighting and intelligent optimization technology can significantly improve the predictive performance of DPM.
Similar content being viewed by others
References
Adolfo JUZ, Gregorio T, Alfonso M (2021) Variable neighborhood search to solve the generalized orienteering problem. Int Trans Oper Res 28(1):142–167
Arcuri A, Briand LA (2011) practical guide for using statistical tests to assess randomized algorithms in software engineering. 33rd International Conference on Software Engineering, 21–28 Honolulu, USA, 1–10
Al-Shabandar R, Hussain A, Laws A, et al. (2017) Machine learning approaches to predict learning outcomes in Massive open online courses. International Joint Conference on Neural Networks, 14–19 Anchorage, Alaska, USA, 713–720
Bradley AP (2013) ROC curve equivalence using the Kolmogorov-Smirnov test. Pattern Recogn Lett 34(5):470–475
Chai Y, Lei C U, Hu X, et al (2018) WPSS: dropout prediction for MOOCs using course progress normalization and subset selection. The Fifth Annual ACM Conference on Learning at Scale, 26–28 London, UK, 29–34
Chaplot D S., Rhim E, Kim J. Predicting student attrition in MOOCs using sentiment analysis and neural networks. The 17th International Conference on Artificial Intelligence in Education, 22–26 June 2015, Madrid, Spain, 7–12
Chen J, Feng J, Sun X et al (2019) MOOC dropout prediction using a hybrid algorithm based on decision tree and extreme learning machine. Math Probl Eng. https://doi.org/10.1155/2019/8404653
Dalipi F, Imran A S, Kastrati Z (2018) MOOC dropout prediction using machine learning techniques: review and research challenges. Global Engineering Education Conference, 17–20 Santa Cruz de Tenerife, Spain, 1007–1014
Fei M, Yeung DY (2015) Temporal models for predicting student dropout in massive open online courses. 2015 IEEE International Conference on Data Mining Workshop, 14–17, Atlantic City, USA, 256–263
Fenghua WEN, Jihong X, Zhifang HE et al (2014) Stock price prediction based on SSA and SVM. Procedia Comput Sci 31:625–631
Ghamisi P, Benediktsson JA (2015) Feature selection based on hybridization of genetic algorithm and particle swarm optimization. IEEE Geosci Remote Sens Lett 12(2):309–313
Halawa S, Daniel G, John M. Dropout prediction in MOOCs using learner activity features. The Second European MOOC Stakeholder Summit, 10–12 February 2014, Laussanne, Switzerland, 3–12
Harris T (2015) Credit scoring using the clustered support vector machine. Expert Syst Appl 42(2):741–750
Hasbun T, Araya A, Villalon J (2016) Extracurricular activities as dropout prediction factors in higher education using decision trees. 2016 IEEE 16th International Conference on Advanced Learning Technologies, 25–, Austin, TX, USA, 242–244
Heidrich L, Barbosa JLV, Cambruzzi W et al (2018) Diagnosis of learner dropout based on learning styles for online distance learning. Telematics Inform 35(6):1593–1606
Hernández-Orallo J, Flach P, Ferri C (2013) ROC curves in cost space. Mach Learn 93(1):71–91
Hung JL, Wang MC, Wang C et al (2017) Identifying at-risk students for early interventions-a time-series clustering approach. IEEE Trans Emerg Top Comput 5(1):45–55
Iam-On N, Boongoen T (2017) Improved student dropout prediction in Thai University using ensemble of mixed-type data clusterings. Int J Mach Learn Cybern 8(2):497–510
Jin C (2020) MOOC student dropout prediction model based on learning behavior features and parameter optimization. Interact Learn Environ. https://doi.org/10.1080/10494820.2020.1802300
Jin C, Dong EM (2015) Software defect prediction using fuzzy integral and genetic algorithm. International Conference on Software Engineering and Information Technology, 26–28 Guilin, China, 334–340
Jin C, Jin P (2009) Fingerprint classification in DCT domain using RBF neural networks. J Inf Sci Eng 25(6):1955–1962
Jin C, Jin SW (2015) Prediction approach of software fault-proneness based on hybrid artificial neural network and quantum particle swarm optimization. Appl Soft Comput 35:717–725
Jin C, Jin SW (2016) A multi-label image annotation scheme based on improved SVM multiple kernel learning. The Eighth International Conference on Graphic and Image Processing, International Society for Optics and Photonics, 29–31, Tokyo, Japan, Vol.10225, 1022510
Jin C, Liu JA (2016) An experimental assessment of hybrid genetic-simulated annealing algorithm. The 13th International Symposium on Neural Networks, LNCS 9719, 6–8 Saint Petersburg, Russia, 595–602
Jordan K (2014) Initial trends in enrolment and completion of massive open online courses. Int Rev Res Open Distrib Learn 15(1):133–160
Leandro DSC (2008) A quantum particle swarm optimizer with chaotic mutation operator. Chaos, Solitons Fractals 37(5):1409–1418
Li W, Gao M, Li H, et al (2016) Dropout prediction in MOOCs using behavior features and multi-view semi-supervised learning. International Joint Conference on Neural Networks, 24–29, Vancouver, Canada, 3130–3137
Liang J, Li C, Zheng L (2016) Machine learning application in MOOCs: dropout prediction. The 11th International Conference on Computer Science & Education, 23–25, Nagoya, Japan, 52–57
Liang J, Yang J, Wu Y, et al (2016) Big data application in education: dropout prediction in edx MOOCs. 2016 IEEE Second International Conference on Multimedia Big Data, 20–22 Taipei, Taiwan, 440–443
Liese F, Miescke KJ (2008) Statistical decision theory: estimation, testing, and selection. Springer, Berlin
Lim WH, Isa NAM (2014) Teaching and peer-learning particle swarm optimization. Appl Soft Comput 18:39–58
Liu HY, Wang ZH, Benachour P, et al. (2018) A time series classification method for behaviour-based dropout prediction. The 18th International Conference on Advanced Learning Technologies, 9–13 Mumbai, India, 191–195
Mirjalili S (2019) Genetic algorithm. Evolutionary algorithms and neural networks. Springer, Cham, pp 43–45
Mohamed AM (2020) A novel hybrid particle swarm optimization and gravitational search algorithm for multi-objective optimization of text mining. Appl Soft Comput 90:106189
Olsson AE (2010) Particle swarm optimization: theory, techniques and applications. Nova Science Publishers, Inc.,
Pedro MMM, Pedro JMM, Jorge MM et al (2020) Temporal analysis for dropout prediction using self-regulated learning strategies in self-paced MOOCs. Comput Educ 145:103728. https://doi.org/10.1016/j.compedu.2019.103728
Prenkaj B, Stilo G, Lorenzo M (2020) Challenges and solutions to the student dropout prediction problem in online courses. The 29th ACM International Conference on Information and Knowledge Management (CIKM'20), 19–23, 3513–3514
Qiu L, Liu Y, Hu Q et al (2019) Student dropout prediction in massive open online courses by convolutional neural networks. Soft Comput 23(20):10287–10301
Rebecca M S, Gloria A (2014) Mass attrition: An analysis of drop out from a principles of microeconomics MOOC. Social Science Research Networks, 1–19
Roshankumar RM (2020) Enhance clustering algorithm using optimization. Int J Res Eng, Sci Manage 3(9):136–142
Şahin M (2020) A comparative analysis of dropout prediction in massive open online courses. Arab J Sci Eng. https://doi.org/10.1007/s13369-020-05127-9
Senthil KN, Atilla E (2020) An effective prediction model for online course dropout rate. Int J Distance Edu Technol 18(4):94–110
Sun J, Wu X, Palade V et al (2012) Convergence analysis and improvements of quantum-behaved particle swarm optimization. Inf Sci 193:81–103
Tang J, Tian Y, Zhang P et al (2018) Multiview privileged support vector machines. IEEE Trans Neural Netw Learn Syst 29(8):3463–3477
Tang C, Ouyang Y, Rong W, et al (2018) Time series model for predicting dropout in massive open online courses. The 19th International Conference on Artificial Intelligence in Education, LNCS 10948, 27–30 London, UK, 353–357
Vargha A, Delaney HD (2000) A critique and improvement of the CL common language effect size statistics of McGraw and Wong. J Edu Behav Stat 25:101–132
Wang W, Yu H, Miao C (2017) Deep model for dropout prediction in MOOCs. The 2nd International Conference on Crowd Science and Engineering, 6–9, Beijing, China, 26–32
Xing W, Du D (2018) Dropout prediction in MOOCs: using deep learning for personalized intervention. J Edu Comput Res 57(3):547–570
Xing WL, Chen X, Stein J, Marcinkowski M (2016) Temporal predication of dropouts in MOOCs: reaching the low hanging fruit through stacking generalization. Comput Hum Behav 58:119–129
Zabarankin M, Uryasev S (2016) Statistical decision problems. Springer-Verlag, New York
Zhang Y, Lu S, Zhou X et al (2016) Comparison of machine learning methods for stationary wavelet entropy-based multiple sclerosis detection: decision tree, k-nearest neighbors, and support vector machine. SIMULATION 92(9):861–871
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author declares that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Jin, C. Dropout prediction model in MOOC based on clickstream data and student sample weight. Soft Comput 25, 8971–8988 (2021). https://doi.org/10.1007/s00500-021-05795-1
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-021-05795-1