Skip to main content
Log in

Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models

  • Original Paper
  • Published:
AStA Advances in Statistical Analysis Aims and scope Submit manuscript

Abstract

Data-based methods and statistical models are given special attention to the study of sports injuries to gain in-depth understanding of its risk factors and mechanisms. The objective of this work is to evaluate the use of shared frailty Cox models for the prediction of occurring sports injuries, and to compare their performance with different sets of variables selected by several regularized variable selection approaches. The study is motivated by specific characteristics commonly found for sports injury data, that usually include reduced sample size and even fewer number of injuries, coupled with a large number of potentially influential variables. Hence, we conduct a simulation study to address these statistical challenges and to explore regularized Cox model strategies together with shared frailty models in different controlled situations. We show that predictive performance greatly improves as more player observations are available. Methods that result in sparse models and favour interpretability, e.g. Best Subset Selection and Boosting, are preferred when the sample size is small. We include a real case study of injuries of female football players of a Spanish football club.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  • Andersen, P.K., Gill, R.D.: Coxs regression model for counting processes: a large sample study. The Annals of Statistics , 1100–1120 (1982)

  • Androulakis, E., Koukouvinos, C., Vonta, F.: Estimation and variable selection via frailty models with penalized likelihood. Stat. Med. 31(20), 2223–2239 (2012)

    Article  MathSciNet  MATH  Google Scholar 

  • Bahr, R.: Why screening tests to predict injury do not work-and probably never will...: a critical review. Br. J. Sports Med. 50(13), 776–780 (2016)

    Article  Google Scholar 

  • Bair, E., Hastie, T., Paul, D., Tibshirani, R.: Prediction by supervised principal components. J. Am. Stat. Assoc. 101(473), 119–137 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Bender, A., Groll, A., Scheipl, F.: A generalized additive model approach to time-to-event analysis. Stat. Model. 18(3–4), 299–321 (2018)

    Article  MathSciNet  MATH  Google Scholar 

  • Binder, H.: CoxBoost: Cox models by likelihood based boosting for a single survival endpoint or competing risks. R package version 1, 4 (2013)

    Google Scholar 

  • Binder, H., Schumacher, M.: Adapting prediction error estimates for biased complexity selection in high-dimensional bootstrap samples. Statistical Applications in Genetics and Molecular Biology 7(1), (2008)

  • Bolling, C., Van Mechelen, W., Pasman, H.R., Verhagen, E.: Context matters: revisiting the first step of the sequence of prevention of sports injuries. Sports Med. 48(10), 2227–2234 (2018)

    Article  Google Scholar 

  • Breheny, P., Huang, J.: Group descent algorithms for nonconvex penalized linear and logistic regression models with grouped predictors. Stat. Comput. 25, 173–187 (2015)

    Article  MathSciNet  MATH  Google Scholar 

  • Bühlmann, P., Hothorn, T., et al.: Boosting algorithms: Regularization, prediction and model fitting. Stat. Sci. 22(4), 477–505 (2007)

    MathSciNet  MATH  Google Scholar 

  • Chatterjee, A., Lahiri, S.: Asymptotic properties of the residual bootstrap for lasso estimators. Proceed. Am. Math. Soc. 138(12), 4497–4509 (2010)

    Article  MathSciNet  MATH  Google Scholar 

  • Cox, D.R.: Regression models and life-tables. J. Roy. Stat. Soc.: Ser. B (Methodol.) 34(2), 187–202 (1972)

    MathSciNet  MATH  Google Scholar 

  • Cox, D.R.: Partial likelihood. Biometrika 62(2), 269–276 (1975)

    Article  MathSciNet  MATH  Google Scholar 

  • Croisier, J.-L., Forthomme, B., Namurois, M.-H., Vanderthommen, M., Crielaard, J.-M.: Hamstring muscle strain recurrence and strength performance disorders. Am. J. Sports Med. 30(2), 199–203 (2002)

    Article  Google Scholar 

  • Croisier, J.-L., Réveillon, V., Ferret, J., Cotte, T., Genty, M., Popovic, N., Mohty, F., Faryniuk, J., Ganteaume, S., Crielaard, J.-M.: Isokinetic assessment of knee flexors and extensors in professional soccer players. Isokinet. Exerc. Sci. 11(1), 61–62 (2003)

    Article  Google Scholar 

  • Crossley, K.M., Patterson, B.E., Culvenor, A.G., Bruder, A.M., Mosler, A.B., Mentiplay, B.F.: Making football safer for women: a systematic review and meta-analysis of injury prevention programmes in 11 773 female football (soccer) players. British journal of sports medicine (2020)

  • De Visser, H., Reijman, M., Heijboer, M., Bos, P.: Risk factors of recurrent hamstring injuries: a systematic review. Br. J. Sports Med. 46(2), 124–130 (2012)

    Article  Google Scholar 

  • Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap method. J. Am. Stat. Assoc. 92(438), 548–560 (1997)

    MathSciNet  MATH  Google Scholar 

  • Fan, J., Li, R.: Variable selection for coxs proportional hazards model and frailty model. Annals of Statistics , 74–99 (2002)

  • Friedman, J., Hastie, T., Tibshirani, R.: Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33(1), 1–22 (2010)

    Article  Google Scholar 

  • Fuller, C.W., Ekstrand, J., Junge, A., Andersen, T.E., Bahr, R., Dvorak, J., Hägglund, M., McCrory, P., Meeuwisse, W.H.: Consensus statement on injury definitions and data collection procedures in studies of football (soccer) injuries. Scand. J. Med. Sci. Sports 16(2), 83–92 (2006)

    Article  Google Scholar 

  • Gabbett, T.J., Ullah, S., Finch, C.F.: Identifying risk factors for contact injury in professional rugby league players-application of a frailty model for recurrent injury. J. Sci. Med. Sport 15(6), 496–504 (2012)

    Article  Google Scholar 

  • Gasparini, A., Clements, M.S., Abrams, K.R., Crowther, M.J.: Impact of model misspecification in shared frailty survival models. Stat. Med. 38(23), 4477–4502 (2019)

    Article  MathSciNet  Google Scholar 

  • Gerds, T.A., Schumacher, M.: Consistent estimation of the expected brier score in general survival models with right-censored event times. Biom. J. 48(6), 1029–1040 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Graf, E., Schmoor, C., Sauerbrei, W., Schumacher, M.: Assessment and comparison of prognostic classification schemes for survival data. Stat. Med. 18(17–18), 2529–2545 (1999)

    Article  Google Scholar 

  • Groll, A.: PenCoxFrail: Regularization in Cox Frailty Models. R package version 1, 1 (2016)

    Google Scholar 

  • Groll, A., Hastie, T., Tutz, G.: Selection of effects in cox frailty models by regularization methods. Biometrics 73(3), 846–856 (2017)

    Article  MathSciNet  Google Scholar 

  • Hägglund, M., Waldén, M., Ekstrand, J.: Previous injury as a risk factor for injury in elite football: a prospective study over two consecutive seasons. Br. J. Sports Med. 40(9), 767–772 (2006)

    Article  Google Scholar 

  • Harden, J.J., Kropko, J.: Simulating duration data for the cox model. Polit. Sci. Res. Methods 7(4), 921–928 (2019)

    Article  Google Scholar 

  • Herrmann, M., Probst, P., Hornung, R., Jurinovic, V., and Boulesteix, A.-L. (2020). Large-scale benchmark study of survival prediction methods using multi-omics data. arXiv preprint arXiv:2003.03621

  • Hewett, T.E., Myer, G.D., Ford, K.R., Heidt, R.S., Jr., Colosimo, A.J., McLean, S.G., Van den Bogert, A.J., Paterno, M.V., Succop, P.: Biomechanical measures of neuromuscular control and valgus loading of the knee predict anterior cruciate ligament injury risk in female athletes: a prospective study. Am. J. Sports Med. 33(4), 492–501 (2005)

    Article  Google Scholar 

  • Hoerl, A.E., Kennard, R.W.: Ridge regression iterative estimation of the biasing parameter. Commun. Stat.-Theory Methods 5(1), 77–88 (1976)

    Article  MATH  Google Scholar 

  • Hohberg, M. and Groll, A. (2020). A flexible adaptive lasso cox frailty model based on the full likelihood. arXiv preprint arXiv:2003.14118

  • Hougaard, P.: Frailty models for survival data. Lifetime Data Anal. 1(3), 255–273 (1995)

    Article  MathSciNet  Google Scholar 

  • Impellizzeri, F.M., Rampinini, E., Maffiuletti, N., Marcora, S.M.: A vertical jump force test for assessing bilateral strength asymmetry in athletes. Med. Sci. Sports Exerc. 39(11), 2044–2050 (2007)

    Article  Google Scholar 

  • Ishwaran, H., Kogalur, U.B., Blackstone, E.H., Lauer, M.S., et al.: Random survival forests. Annals Appl. Stat. 2(3), 841–860 (2008)

    Article  MathSciNet  MATH  Google Scholar 

  • Kelly, P.J., Lim, L.L.-Y.: Survival analysis for recurrent event data: an application to childhood infectious diseases. Stat. Med. 19(1), 13–33 (2000)

    Article  Google Scholar 

  • Knapik, J.J., Bauman, C.L., Jones, B.H., Harris, J.M., Vaughan, L.: Preseason strength and flexibility imbalances associated with athletic injuries in female collegiate athletes. Am. J. Sports Med. 19(1), 76–81 (1991)

    Article  Google Scholar 

  • Larruskain, J., Celorrio, D., Barrio, I., Odriozola, A., Gil, S.M., Fernandez-Lopez, J.R., Nozal, R., Ortuzar, I., Lekue, J.A., Aznar, J.M.: Genetic variants and hamstring injury in soccer: an association and validation study. Med. Sci. Sports Exerc. 50(2), 361–368 (2018)

    Article  Google Scholar 

  • LeBlanc, M., Crowley, J.: Relative risk trees for censored survival data. Biometrics , 411–425 (1992)

  • Li, H. and Luan, Y. (2002). Kernel cox regression models for linking gene expression profiles to censored survival data. In Biocomputing 2003, pages 65–76. World Scientific

  • Liu, X.-R., Pawitan, Y., Clements, M.S.: Generalized survival models for correlated time-to-event data. Stat. Med. 36(29), 4743–4762 (2017)

    Article  MathSciNet  Google Scholar 

  • McCall, A., Carling, C., Davison, M., Nedelec, M., Le Gall, F., Berthoin, S., Dupont, G.: Injury risk factors, screening tests and preventative strategies: a systematic review of the evidence that underpins the perceptions and practices of 44 football (soccer) teams from various premier leagues. Br. J. Sports Med. 49(9), 583–589 (2015)

    Article  Google Scholar 

  • McGilchrist, C., Aisbett, C.: Regression with frailty in survival analysis. Biometrics , 461–466 (1991)

  • Mogensen, U.B., Ishwaran, H., Gerds, T.A.: Evaluating random forests for survival analysis using prediction error curves. J. Stat. Softw. 50(11), 1 (2012)

    Article  Google Scholar 

  • Møller, M., Nielsen, R., Attermann, J., Wedderkopp, N., Lind, M., Sørensen, H., Myklebust, G.: Handball load and shoulder injury rate: a 31-week cohort study of 679 elite youth handball players. Br. J. Sports Med. 51(4), 231–237 (2017)

    Article  Google Scholar 

  • Morris, T.P., White, I.R., Crowther, M.J.: Using simulation studies to evaluate statistical methods. Stat. Med. 38(11), 2074–2102 (2019)

    Article  MathSciNet  Google Scholar 

  • Nielsen, R.O., Bertelsen, M.L., Ramskov, D., Møller, M., Hulme, A., Theisen, D., Finch, C.F., Fortington, L.V., Mansournia, M.A., Parner, E.T.: Time-to-event analysis for sports injury research part 2: time-varying outcomes. Br. J. Sports Med. 53(1), 70–78 (2019)

    Article  Google Scholar 

  • Nielsen, R.Ø., Malisoux, L., Møller, M., Theisen, D., Parner, E.T.: Shedding light on the etiology of sports injuries: a look behind the scenes of time-to-event analyses. J. Orthop. Sports Phys. Therapy 46(4), 300–311 (2016)

    Article  Google Scholar 

  • Pan, W.: Using frailties in the accelerated failure time model. Lifetime Data Anal. 7(1), 55–64 (2001)

    Article  MathSciNet  MATH  Google Scholar 

  • Prentice, R.L., Williams, B.J., Peterson, A.V.: On the regression analysis of multivariate failure time data. Biometrika 68(2), 373–379 (1981)

    Article  MathSciNet  MATH  Google Scholar 

  • R Core Team: R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria (2019)

  • Ripatti, S., Palmgren, J.: Estimation of multivariate frailty models using penalized partial likelihood. Biometrics 56(4), 1016–1022 (2000)

    Article  MathSciNet  MATH  Google Scholar 

  • Rondeau, V., Mazroui, Y., Gonzalez, J.R.: Frailtypack: An r package for the analysis of correlated data with frailty models using the penalized likelihood estimation. Journal Of Statistical Software 47(4), (2012)

  • Rossi, A., Pappalardo, L., Cintia, P., Iaia, F.M., Fernández, J., Medina, D.: Effective injury forecasting in soccer with gps training data and machine learning. PLoS ONE 13(7), e0201264 (2018)

    Article  Google Scholar 

  • Ruddy, J. D., Cormack, S. J., Whiteley, R., Williams, M. D., Timmins, R. G., and Opar, D. A.: Modeling the risk of team sport injuries: a narrative review of different statistical approaches. Frontiers in physiology, 10 (2019)

  • Sartori, S.: Penalized regression: Bootstrap confidence intervals and variable selection for high-dimensional data sets (2011)

  • Steyerberg, E.W., Vickers, A.J., Cook, N.R., Gerds, T., Gonen, M., Obuchowski, N., Pencina, M.J., Kattan, M.W.: Assessing the performance of prediction models: a framework for some traditional and novel measures. Epidemiology 21(1), 128 (2010)

    Article  Google Scholar 

  • Su, X., Fan, J.: Multivariate survival trees: a maximum likelihood approach based on frailty models. Biometrics 60(1), 93–99 (2004)

    Article  MathSciNet  MATH  Google Scholar 

  • Therneau, T. M. (2020). A Package for Survival Analysis in R. R package version 3.2-7

  • Therneau, T.M., Grambsch, P.M., Pankratz, V.S.: Penalized survival models and frailty. J. Comput. Graph. Stat. 12(1), 156–175 (2003)

    Article  MathSciNet  Google Scholar 

  • Tibshirani, R.: The lasso method for variable selection in the cox model. Stat. Med. 16(4), 385–395 (1997)

    Article  Google Scholar 

  • Tutz, G., Binder, H.: Generalized additive modeling with implicit variable selection by likelihood-based boosting. Biometrics 62(4), 961–971 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Ullah, S., Gabbett, T.J., Finch, C.F.: Statistical modelling for recurrent events: an application to sports injuries. Br. J. Sports Med. 48(17), 1287–1293 (2014)

    Article  Google Scholar 

  • Wei, L.-J., Lin, D.Y., Weissfeld, L.: Regression analysis of multivariate incomplete failure time data by modeling marginal distributions. J. Am. Stat. Assoc. 84(408), 1065–1073 (1989)

    Article  MathSciNet  Google Scholar 

  • Wen, C., Zhang, A., Quan, S., Wang, X.: Bess: An r package for best subset selection in linear, logistic and cox proportional hazards models. J. Stat. Softw. 94(4), 1–24 (2020)

    Article  Google Scholar 

  • Witten, D.M., Tibshirani, R.: Survival analysis with high-dimensional covariates. Stat. Methods Med. Res. 19(1), 29–51 (2010)

    Article  MathSciNet  Google Scholar 

  • Yuan, M., Lin, Y.: Model selection and estimation in regression with grouped variables. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 68(1), 49–67 (2006)

    Article  MathSciNet  MATH  Google Scholar 

  • Zou, H., Hastie, T.: Regularization and variable selection via the elastic net. J. R. Stat. Soc.: Ser. B (Stat. Methodol.) 67(2), 301–320 (2005)

    Article  MathSciNet  MATH  Google Scholar 

Download references

Acknowledgements

This research was supported by the Basque Government through the BERC Programme 2018–2021 by the Spanish Ministry of Science, Innovation and Universities MICINN and FEDER: BCAM Severo Ochoa excellence accreditation SEV-2017-0718, and project PID2020-115882RB-I00 funded by AEI/FEDER, UE and acronym “S3M1P4R” and by the German Federal Ministry of Education and Research (BMBF) under Grant No. 01IS18036A. The authors of this work take full responsibility for its content. Furthermore, the authors are thankful to the two anonymous reviewers for their valuable and constructive comments which led to an improved manuscript.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dae-Jin Lee.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zumeta-Olaskoaga, L., Weigert, M., Larruskain, J. et al. Prediction of sports injuries in football: a recurrent time-to-event approach using regularized Cox models. AStA Adv Stat Anal 107, 101–126 (2023). https://doi.org/10.1007/s10182-021-00428-2

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10182-021-00428-2

Keywords

Navigation