Skip to content
BY 4.0 license Open Access Published by De Gruyter Open Access October 13, 2020

Predicting Higher Education Grades using Strategies Correcting for Panel Attrition

  • Marco Giese EMAIL logo
From the journal Open Education Studies

Abstract

This study aims to forecast the final grade of the first higher education degree which can be of considerable interest for higher education institutions to implement early warning systems, students themselves, or potential employers. The analysis is based on the National Education Panel Study (NEPS), a large German dataset covering many aspects of students’ (educational) life. Since panel attrition concerns 35% of participants the Heckman correction and the inverse probability weight (IPW) estimator are used to reduce the estimation bias. A distinction is made between two scenarios, excluding dropout students and including them with a grade of 5.0. Some predictors reveal significant parameter estimates in the first but not in the second scenario, or vice versa, which means that dropout and study performance is not driven by the same variables. To get an early prediction of grades only variables of a pre-university episode were included in the first step. Afterward, variables of the early study phase are included. For the IPW estimator, the R2 improves from 0.202 to 0.593 (dropouts included) when adding the additional variables. The best predictors are the grades at secondary school, grades in the first exams, and the type of institution.

References

Aggarwal, C. C. (2015). Data Mining: The Textbook, volume 1. Springer Science & Business Media.Search in Google Scholar

Asendorpf, J. B., van de Schoot, R., Denissen, J. J. J., and Hutteman, R. (2014). Reducing bias due to systematic attrition in longitudinal studies: The benefits of multiple imputation. International Journal of Behavioral Development, 38(5):453–460.Search in Google Scholar

Aulck, L., Velagapudi, N., Blumenstock, J., and West, J. (2016). Predicting student dropout in higher education. arXiv preprint arXiv:1606.06364.Search in Google Scholar

Baradwaj, B. K. and Pal, S. (2011). Mining educational data to analyze students’ performance. (IJACSA) International Journal of Advanced Computer Science and Applications, 2(6):63–69.Search in Google Scholar

Beck, H. P. and Davidson, W. D. (2001). Establishing an early warning system: Predicting low grades in college students from survey of academic orientations scores. Research in Higher education, 42(6):709–723.Search in Google Scholar

Behr, A. (2006). Comparing estimation strategies for income equations in the presence of panel attrition. Jahrbücher für Nationalökonomie und Statistik, 226(4):361–384.Search in Google Scholar

Behr, A., Bellgardt, E., and Rendtel, U. (2005). Extent and determinants of panel attrition in the european community household panel. European Sociological Review, 21(5):489–512.Search in Google Scholar

Behr, A., Giese, M., Kamdjou, H. D. T., and Theune, K. (2020a). Dropping out of university: a literature review. Review of Education.10.1002/rev3.3202Search in Google Scholar

Behr, A., Giese, M., Teguim, H. D., and Theune, K. (in press). Dropping out from higher education in germany - an empirical evaluation of determinants for bachelor students. Open Education Studies.Search in Google Scholar

Behr, A., Giese, M., Theune, K., et al. (2020b). Early prediction of university dropouts–a random forest approach. Jahrbücher für Nationalökonomie und Statistik, 1(ahead-of-print).10.1515/jbnst-2019-0006Search in Google Scholar

Birkelbach, R., Vietgen, S., and Wallis, M. (2019). DZHW-Studienberechtigtenpanel 2012. Deutsches Zentrum für Hochschul-und Wissenschaftsforschung (DZHW), Hannover, Germany.Search in Google Scholar

Bishop, C. M. (2006). Pattern recognition and machine learning. springer.Search in Google Scholar

Blossfeld, H.-P., Roßbach, H.-G., and von Maurice, J. (2011). Education as a Lifelong Process–The German National Educational Panel Study (NEPS). Zeitschrift für Erziehungswissenschaft [Special Issue], 14.10.1007/s11618-011-0179-2Search in Google Scholar

Blüthmann, I., Lepa, S., and Thiel, F. (2012). Überfordert, enttäuscht, verwählt oder strategisch? eine typologie vorzeitig exmatrikulierter bachelorstudierender. Zeitschrift für Pädagogik, 58(1):89–108.Search in Google Scholar

Bortz, J. and Schuster, C. (2010). Statistik für Human- und Sozialwissenschaftler. Lehrbuch mit Online-Materialien. Springer-Lehrbuch. Springer Berlin Heidelberg.10.1007/978-3-642-12770-0Search in Google Scholar

DESTATIS (2019). Bildung und kultur - prüfungen an hochschulen. https://www.destatis.de/DE/Themen/Gesellschaft-Umwelt/Bildung-Forschung-Kultur/Hochschulen/Publikationen/Downloads-Hochschulen/pruefungen-hochschulen-2110420187004.pdf?__blob=publicationFile(08.01.2020).Search in Google Scholar

Fahrmeir, L. and Tutz, G. (2013). Multivariate statistical modelling based on generalized linear models. Springer Science & Business Media.Search in Google Scholar

Fox, J. (2015). Applied regression analysis and generalized linear models. Sage Publications.Search in Google Scholar

Garciarena, U. and Santana, R. (2017). An extensive analysis of the interaction between missing data types, imputation methods, and supervised classifiers. Expert Systems with Applications, 89:52–65.Search in Google Scholar

Hartung, J., Elpelt, B., and Klösener, K. (2009). Statistik: Lehrund Handbuch der angewandten Statistik; [mit zahlreichen durchgerechneten Beispielen]. Oldenbourg.Search in Google Scholar

Hasan, M. M. and Dunn, P. K. (2011). Two tweedie distributions that are near-optimal for modelling monthly rainfall in australia. International Journal of Climatology, 31(9):1389–1397.Search in Google Scholar

Hastie, T., Tibshirani, R., and Friedman, J. (2009). The elements of statistical learning. Springer.10.1007/978-0-387-84858-7Search in Google Scholar

Heckman, J. (1976). The common structure of statistical models of truncation, sample selection and limited dependent variables and a simple estimator for such models. pages 475–492.Search in Google Scholar

Heublein, U., Ebert, J., Hutzsch, C., Isleib, S., König, R., Richter, J., and Woisch, A. (2017). Zwischen Studienerwartungen und Studienwirklichkeit. Forum Hochschule 1/2017.Search in Google Scholar

Heublein, U., Schmelzer, R., Sommer, D., and Wank, J. (2008). Die entwicklung der studienabbruch-und schwundquotenquoten an den deutschen universitäten und fachhochschulen. HIS-Projektbericht. Hannover.Search in Google Scholar

Heublein, U., Schmelzer, R., Sommer, D., and Wank, J. (2012). Die entwicklung der schwund-und studienabbruchquoten an den deutschen hochschulen. In HIS: Forum Hochschule, volume 3, page 2012.Search in Google Scholar

Jadrić, M., Garača, Ž., and Čukušić, M. (2010). Student dropout analysis with application of data mining methods. Management: Journal of Contemporary Management Issues, 15(1):31–46.Search in Google Scholar

Jørgensen, B. and Paes De Souza, M. C. (1994). Fitting Tweedie’s compound Poisson model to insurance claims data. Scandinavian Actuarial Journal, 1994(1):69–93.Search in Google Scholar

Kleinke, K., Reinecke, J., Salfrán, D., and Spiess, M. (2020). Applied Multiple Imputation: Advantages, Pitfalls, New Developments and Applications in R. Springer Nature.10.1007/978-3-030-38164-6Search in Google Scholar

Knowles, J. E. (2015). Of needles and haystacks: Building an accurate statewide dropout early warning system in wisconsin. JEDM-Journal of Educational Data Mining, 7(3):18–67.Search in Google Scholar

Köhler, C., Pohl, S., and Carstensen, C. H. (2015). Investigating mechanisms for missing responses in competence tests. Psychological Test and Assessment Modeling, 57(4):499.Search in Google Scholar

Lassibille, G. and Gómez, M. L. N. (2009). Tracking students’ progress through the spanish university school sector. Higher Education, 58(6):821–839.Search in Google Scholar

LIfBi (2017). Startkohorte 5: Studierende (SC5) - Studienübersicht Wellen 1 bis 9. Technical report, Leibniz Institut für Bildungsverläufe e.V.Search in Google Scholar

Little, R. J. and Rubin, D. B. (2019). Statistical analysis with missing data, volume 793. John Wiley & Sons.Search in Google Scholar

Mayra, A. and Mauricio, D. (2018). Factors to predict dropout at the universities: A case of study in ecuador. In Global Engineering Education Conference (EDUCON), 2018 IEEE, pages 1238–1242. IEEE.10.1109/EDUCON.2018.8363371Search in Google Scholar

Montmarquette, C., Mahseredjian, S., and Houle, R. (2001). The determinants of university dropouts: a bivariate probability model with sample selection. Economics of Education Review, 20(5):475–484.Search in Google Scholar

Müller, S. and Schneider, T. (2013). Educational pathways and dropout from higher education in germany. Longitudinal and Life Course Studies, 4(3):218–241.Search in Google Scholar

R Core Team (2019). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.Search in Google Scholar

Rios, G. et al. (2013). Predicting early students with high risk to drop out of university using a neural network-based approach. In ICCGI 2013, The Eighth International Multi-Conference on Computing in the Global Information Technology.Search in Google Scholar

Robins, J. M., Rotnitzky, A., and Zhao, L. P. (1995). Analysis of semipara-metric regression models for repeated outcomes in the presence of missing data. Journal of the american statistical association, 90(429):106–121.Search in Google Scholar

Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. Journal of Business & Economic Statistics, 4(1):87–94.Search in Google Scholar

Saar-Tsechansky, M. and Provost, F. (2007). Handling missing values when applying classification models. Journal of machine learning research, 8(Jul):1623–1657.Search in Google Scholar

Sarcletti, A. and Müller, S. (2011). Zum Stand der Studienabbruchforschung. Theoretische Perspektiven, zentrale Ergebnisse und methodische Anforderungen an künftige Studien. Zeitschrift für Bildungsforschung, 1(3):235–248.Search in Google Scholar

Schnepf, S. V. (2014). Do tertiary dropout students really not succeed in european labour markets? IZA Discussion Paper No. 8015.Search in Google Scholar

Sellar, S. and Lingard, B. (2014). The OECD and the expansion of PISA: New global modes of governance in education. British Educational Research Journal, 40(6):917–936.Search in Google Scholar

Sherman, J. (1979). Predicting mathematics performance in high school girls and boys. Journal of Educational Psychology, 71(2):242.Search in Google Scholar

Shono, H. (2008). Application of the tweedie distribution to zero-catch data in cpue analysis. Fisheries Research, 93(1-2):154–162.Search in Google Scholar

Spady, W. G. (1970). Dropouts from higher education: An interdisciplinary review and synthesis. Interchange, 1(1):64–85.Search in Google Scholar

Stinebrickner, R. and Stinebrickner, T. (2014). Academic performance and college dropout: Using longitudinal expectations data to estimate a learning model. Journal of Labor Economics, 32(3):601–644.Search in Google Scholar

Strecht, P., Cruz, L., Soares, C., Mendes-Moreira, J., and Abreu, R. (2015). A comparative study of classification and regression algorithms for modelling students’ academic performance. International Educational Data Mining Society.Search in Google Scholar

Superby, J.-F., Vandamme, J., and Meskens, N. (2006). Determination of factors influencing the achievement of the first-year university students using data mining methods. In Workshop on Educational Data Mining, volume 32, page 234.Search in Google Scholar

Tinto, V. (1975). Dropout from higher education: A theoretical synthesis of recent research. Review of Educational Research, 45(1):89–125.Search in Google Scholar

Tweedie, M. C. (1984). An index which distinguishes between some important exponential families. In Statistics: Applications and new directions: Proc. Indian statistical institute golden Jubilee International conference, volume 579, pages 579–604.Search in Google Scholar

Van Buuren, S. (2018). Flexible imputation of missing data. Chapman and Hall/CRC.Search in Google Scholar

van Buuren, S. and Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in r. Journal of Statistical Software, 45(3):1–67.Search in Google Scholar

Vandecasteele, L. and Debels, A. (2007). Attrition in panel data: the effectiveness of weighting. European Sociological Review, 23(1):81–97.Search in Google Scholar

Velasco, M. S. et al. (2012). More than just good grades: candidates’ perceptions about the skills and attributes employers seek in new graduates. Journal of Business Economics and Management, 13(3):499–517.Search in Google Scholar

Vink, G., Frank, L. E., Pannekoek, J., and Van Buuren, S. (2014). Predictive mean matching imputation of semicontinuous variables. Statistica Neerlandica, 68(1):61–90.Search in Google Scholar

Zinn, S. (2019). Samples, weights, and nonresponse: Neps starting cohort 5 – first-year students - from higher education to the labor market (wave 12). Technical report, Bamberg, Germany: Leibniz Instiute for Educational Trajectories, National Educational Panel Study.Search in Google Scholar

Zinn, S., Steinhauer, H. W., and Aßmann, C. (2017). Samples, Weights, and Nonresponse: the Student Sample of the National Educational Panel Study (Wave 1 to 8) (NEPS Survey Paper No. 18). Technical report, Leibniz Institut für Bildungsverläufe e.V.Search in Google Scholar

Received: 2020-06-12
Accepted: 2020-08-28
Published Online: 2020-10-13

© 2020 Marco Giese, published by De Gruyter

This work is licensed under the Creative Commons Attribution 4.0 International License.

Downloaded on 1.5.2024 from https://www.degruyter.com/document/doi/10.1515/edu-2020-0123/html
Scroll to top button