Prediction of absenteeism in public schools teachers with machine learning

Authors

DOI:

https://doi.org/10.11606/s1518-8787.2021055002677

Keywords:

Absenteeism, Risk factors, Supervised machine learning, School teachers, Early childhood education

Abstract

OBJECTIVE To predict the risk of absence from work due to morbidities of teachers working in early childhood education in the municipal public schools, using machine learning algorithms. METHODS This is a cross-sectional study using secondary, public and anonymous data from the Relação Anual de Informações Sociais, selecting early childhood education teachers who worked in the municipal public schools of the state of São Paulo between 2014 and 2018 (n = 174,294). Data on the average number of students per class and number of inhabitants in the municipality were also linked. The data were separated into training and testing, using records from 2014 to 2016 (n = 103,357) to train five predictive models, and data from 2017 to 2018 (n = 70,937) to test their performance in new data. The predictive performance of the algorithms was evaluated using the value of the area under the ROC curve (AUROC). RESULTS All five algorithms tested showed an area under the curve above 0.76. The algorithm with the best predictive performance (artificial neural networks) achieved 0.79 of area under the curve, with accuracy of 71.52%, sensitivity of 72.86%, specificity of 70.52%, and kappa of 0.427 in the test data. CONCLUSION It is possible to predict cases of sickness absence in teachers of public schools with machine learning using public data. The best algorithm showed a better result of the area under the curve when compared with the reference model (logistic regression). The algorithms can contribute to more assertive predictions in the public health and worker health areas, allowing to monitor and help prevent the absence of these workers due to morbidity.

References

DIEESE. Anuário do Sistema Público de Emprego, Trabalho e Renda: mercado de trabalho 2016. São Paulo; 2016. [ Links ]

Gasparini SM, Barreto SM, Assunção AA. O professor, as condições de trabalho e os efeitos sobre sua saúde. Educ Pesqui. 2005;31(2):189-99. https://doi.org/10.1590/S1517-97022005000200003 [ Links ]

Medeiros AM, Vieira MT. Ausência ao trabalho por distúrbio vocal de professores da Educação Básica no Brasil. Cad Saude Publica. 2019;35 Supl 1:e00171717. https://doi.org/10.1590/0102-311x00171717 [ Links ]

Arcoverde, L, Franco E, Galvão D, Prado G. Número de professores afastados por transtornos em SP quase dobra em 2016 e vai a 50 mil. G1 Globo News (São Paulo Ed.). 21 nov 2017 [cited 2020 Feb 7]. Available from: https://g1.globo.com/sp/sao-paulo/noticia/numero-de-professores-afastados-por-transtornos-em-sp-quase-dobra-em-2016-e-vai-a-50-mil.ghtml [ Links ]

Rodríguez-Loureiro L, Artazcoz L, López-Ruiz M, Assunção AA, Benavides FG. Joint effect of paid working hours and multiple job holding on work absence due to health problems among basic education teachers in Brazil: the Educatel Study. Cad Saude Publica. 2019;35 Supl 1:e00081118. https://doi.org/10.1590/0102-311x00081118 [ Links ]

Silva J, Fischer FM. Invasão multiforme da vida pelo trabalho entre professores de educação básica e repercussões sobre a saúde. Rev Saude Publica. 2020;54:03. https://doi.org/10.11606/s1518-8787.2020054001547 [ Links ]

Assunção AA, Oliveira DA. Intensificação do trabalho e saúde dos professores. Educ Soc. 2009;30(107):349-72. https://doi.org/10.1590/S0101-73302009000200003 [ Links ]

Porto LA, Oliveira NF, Carvalho FM, Araújo TM. Construção de um índice de morbidade para professoras da educação básica. Rev Baiana Saude Publica. 2008;32(2):282-96. https://doi.org/10.22278/2318-2660.2008.v32.n2.a1449 [ Links ]

Maia EG, Claro RM, Assunção AA. Múltiplas exposições ao risco de faltar ao trabalho nas escolas da Educação Básica no Brasil. Cad Saude Publica. 2019;35 Supl 1:e00166517. https://doi.org/10.1590/0102-311x00166517 [ Links ]

Portal Brasileiro de Dados Abertos. Primeira Lei de Acesso no mundo que prevê dados abertos. Brasília, DF; 2018 [cited 2018 Oct 1]. Available from: http://dados.gov.br/noticia/primeira-lei-de-acesso-no-mundo-que-preve-dados-abertos [ Links ]

Ministério do Trabalho (BR). PDET – Progama de Disseminação das Estatísticas do Trabalho. Microdados RAIS e CAGED. Brasília, DF; 2019 [cited 2020 Feb 7]. Available from: http://pdet.mte.gov.br/microdados-rais-e-caged [ Links ]

Brasil. Decreto Nº 76.900, de 23 de dezembro de 1975. Institui a Relação Anual de Informações Sociais – RAIS e dá outras providências. Brasília, DF; 1975 [cited 2020 Feb 28]. Available from: http://www.planalto.gov.br/ccivil_03/decreto/Antigos/D76900.htm [ Links ]

Fernandes FT, Chiavegatto Filho ADP. Perspectivas do uso de mineração de dados e aprendizado de máquina em saúde e segurança no trabalho. Rev Bras Saude Ocup. 2019;44:e13. https://doi.org/10.1590/2317-6369000019418 [ Links ]

Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira. Indicadores Educacionais. Brasília, DF: INEP; 2019 [cited 2019 Sep 18]. Available from: http://portal.inep.gov.br/web/guest/indicadores-educacionais [ Links ]

Fundação Sistema Estadual de Análise de Dados. Informações dos Municípios Paulistas – IMP. São Paulo: SEADE; 2019 [cited 2019 Feb 18]. Available from: http://www.imp.seade.gov.br/frontend/#/ [ Links ]

Kuhn M, Johnson K. Applied predictive modeling. New York: Springer Science & Business Media; 2013. [ Links ]

OECD Data. Urban population by city size. Paris; 2018 [cited 2019 Feb 18]. Available from: https://data.oecd.org/popregion/urban-population-by-city-size.htm [ Links ]

Breiman L. Random forests. Mach Learn. 2001;45:5-32. https://doi.org/10.1023/A:1010933404324 [ Links ]

Chen T, Guestrin C. XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2016 Aug; San Francisco, CA. New York: Association for Computing Machinery; 2016. p.785-94. [ Links ]

Bishop C. Neural networks for pattern recognition. Oxford (UK): Oxford University Press; 1995. [ Links ]

Hastie T, Tibshirani R, Friedman J. The elements of statistical learning. 2. ed. New York: Springer Science & Business Media; 2016. [ Links ]

Bradley AP. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997;30(7):1145-59. https://doi.org/10.1016/S0031-3203(96)00142-2 [ Links ]

Moons KGM, Altman DG, Reitsma JB, Ioannidis JPA, Macaskill P, Steyerberg EW, et al. Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD): explanation and elaboration. Ann Intern Med. 2015;162(1):W1-73. https://doi.org/10.7326/M14-0698 [ Links ]

Gevrey M, Dimopoulos I, Lek S. Review and comparison of methods to study the contribution of variables in artificial neural network models. Ecol Modell. 2003;160(3):249-64. https://doi.org/10.1016/S0304-3800(02)00257-0 [ Links ]

Kuhn M, Wing J, Weston S, Williams A, Keefer C, Engelhardt A, et al. A Short Introduction to the caret package. 2020 [cited 2020 Sep 22]. Available from: https://cran.r-project.org/web/packages/caret/vignettes/caret.html [ Links ]

Kuhn M. Variable importance. 2019 [cited 2020 Aug 11]. Available from: https://topepo.github.io/caret/variable-importance.html [ Links ]

Santos HG. Comparação da performance de algoritmos de machine learning para a análise preditiva em saúde pública e medicina [tese]. São Paulo: Faculdade de Saúde Pública da Universidade de São Paulo; 2018. [ Links ]

Rezende BA, Medeiros AM, Silva AM, Assunção AA. Fatores associados à percepção de ruído ocupacional intenso pelos professores da educação básica no Brasil. Rev Bras Epidemiol. 2019;22:e190063. https://doi.org/10.1590/1980-549720190063 [ Links ]

Aliabadi M, Farhadian M, Darvishi E. Prediction of hearing loss among the noise-exposed workers in a steel factory using artificial intelligence approach. Int Arch Occup Environ Health. 2015;88(6):779-87. https://doi.org/10.1007/s00420-014-1004-z [ Links ]

Lee YC, Huang SC, Huang CH, Wu HH. A new approach to identify high burnout medical staffs by kernel K-means cluster analysis in a regional teaching hospital in Taiwan. Inquiry. 2016;53:0046958016679306. https://doi.org/10.1177/0046958016679306 [ Links ]

Ferris G, Bergin TG, Wayne SJ. Personal characteristics, job performance, and absenteeism of public school teachers. J Appl Soc Psychol. 1988;18(7):552-63. https://doi.org/10.1111/j.1559-1816.1988.tb00036.x [ Links ]

Rosenblatt Z, Shirom A. Predicting teacher absenteeism by personal background factors. J Educ Adm. 2005;43(2):209-25. https://doi.org/10.1108/09578230510586597 [ Links ]

Miller RT, Murnane RJ, Willett JB. Do teacher absences impact student achievement? Longitudinal evidence from one urban school district. Educ Eval Policy Anal. 2008;30(2):181-200. https://doi.org/10.3102/0162373708318019 [ Links ]

Published

2021-06-14

Issue

Section

Original Articles

How to Cite

Fernandes, F. T., & Chiavegatto Filho, A. D. P. (2021). Prediction of absenteeism in public schools teachers with machine learning. Revista De Saúde Pública, 55, 23. https://doi.org/10.11606/s1518-8787.2021055002677